Perhaps we should just have a policy that it is a schema bug if data parses but does not unparse regardless of whether the infoset created by the parse is valid or not.
This of course assumes a schema designed to support both parse and unparse. Some schemas will intentionally be just for parse, but in the Cyberian use case we've only seen parse+unparse schemas. On Tue, Sep 16, 2025 at 3:09 PM Steve Lawrence <[email protected]> wrote: > I haven't tested it, but looking at the code I think this is already the > case. > > The CLI exits with a non-zero exit code for parse or validation errors: > > > https://github.com/apache/daffodil/blob/main/daffodil-cli/src/main/scala/org/apache/daffodil/cli/Main.scala#L1295-L1298 > > The ParseResult.isError API returns true for either parse or validation > errors: > > > https://github.com/apache/daffodil/blob/main/daffodil-core/src/main/scala/org/apache/daffodil/runtime1/iapi/DFDLParserUnparser.scala#L217 > > It looks like this behavior goes back at to least to Daffodil 3.0.0, so > any > modern version should have this behavior. If users are running into this, > it > might mean they aren't checking the CLI exit code, or using the API and > explicitly testing ParseResult.isProcessorError instead of isError, or > there is > a bug in Daffodil. > > Or maybe they are parsing without validation enabled? In which case maybe > we > just need better documentation somewhere? Some schemas might not unparse > with > well-formed and but invalid data. In these cases, it might be important to > parse > with validation enabled and check for validation error or it could lead to > unparse failures. > > > On 2025-09-16 02:45 PM, Mike Beckerle wrote: > > It seems like many people are surprised by when data successfully parses, > > it does not successfully unparse. > > > > This happens when the parsed data is well-formed, but values do not obey > > the XSD facets. That is, the XML result from parsing is created, but it > is > > invalid. > > > > Depending on the DFDL schema, such data may not unparse successfully. > > > > Users who test schemas with a simple parse -> unparse process and test > data > > that has this well-formed-but-invalid behavior may get the impression > that > > there is a problem with the schema, but really it is just that validation > > errors coming out of the parse are not being escalated into true errors. > > This behavior of daffodil holds regardless of whether Daffodil is > > configured to do validation or not, as validation errors are never parse > > errors. They are effectively just warnings. I think this is unintuitive > to > > many users, who expect the DFDL parse cannot produce invalid XML. > > > > A test process that does parse -> XSD Validate -> unparse, is correct. > The > > XSD Validate step in the middle would block such messages as invalid and > > they'd never get to the unparser so would not fail in the unparser. > > > > With that background, should we have an option where at the end of a > > Daffodil parse, if there are validation errors we can cause the entire > > parse to be considered a failure? This is not the same as escalating > > individual validation errors into parse errors as that would affect > > backtracking behavior. This is a separate final check once the DFDL > Infoset > > has been ceated. > > > > API users of Daffodil can of course inspect output for validation errors > > and do this themselves. I just think they are not aware that this is > needed. > > > > Thoughts? > > > > > > > > Mike Beckerle > > Apache Daffodil PMC | daffodil.apache.org > > OGF DFDL Workgroup Co-Chair | > www.ogf.org/ogf/doku.php/standards/dfdl/dfdl > > Owl Cyber Defense | www.owlcyberdefense.com > > > >
