So this issue is more subtle than this heavy-handed policy of parse-unparse
symmetry.

There are reasons why parse may want to do error recovery, so as to not
fail to parse a whole file of data on the first malformed record in it.
In doing so it must produce something invalid corresponding to the
malformed record, and this element never wants to unparse.
If a file has records and daffodil is able to compute the length of the
records reliably then this sort of error recovery is possible.

On Wed, Sep 17, 2025 at 8:43 AM Mike Beckerle <[email protected]> wrote:

> Perhaps we should just have a policy that it is a schema bug if data
> parses but does not unparse regardless of whether the infoset created by
> the parse is valid or not.
>
> This of course assumes a schema designed to support both parse and
> unparse. Some schemas will intentionally be just for parse, but in the
> Cyberian use case we've only seen parse+unparse schemas.
>
>
> On Tue, Sep 16, 2025 at 3:09 PM Steve Lawrence <[email protected]>
> wrote:
>
>> I haven't tested it, but looking at the code I think this is already the
>> case.
>>
>> The CLI exits with a non-zero exit code for parse or validation errors:
>>
>>
>> https://github.com/apache/daffodil/blob/main/daffodil-cli/src/main/scala/org/apache/daffodil/cli/Main.scala#L1295-L1298
>>
>> The ParseResult.isError API returns true for either parse or validation
>> errors:
>>
>>
>> https://github.com/apache/daffodil/blob/main/daffodil-core/src/main/scala/org/apache/daffodil/runtime1/iapi/DFDLParserUnparser.scala#L217
>>
>> It looks like this behavior goes back at to least to Daffodil 3.0.0, so
>> any
>> modern version should have this behavior. If users are running into this,
>> it
>> might mean they aren't checking the CLI exit code, or using the API and
>> explicitly testing ParseResult.isProcessorError instead of isError, or
>> there is
>> a bug in Daffodil.
>>
>> Or maybe they are parsing without validation enabled? In which case maybe
>> we
>> just need better documentation somewhere? Some schemas might not unparse
>> with
>> well-formed and but invalid data. In these cases, it might be important
>> to parse
>> with validation enabled and check for validation error or it could lead
>> to
>> unparse failures.
>>
>>
>> On 2025-09-16 02:45 PM, Mike Beckerle wrote:
>> > It seems like many people are surprised by when data successfully
>> parses,
>> > it does not successfully unparse.
>> >
>> > This happens when the parsed data is well-formed, but values do not obey
>> > the XSD facets. That is, the XML result from parsing is created, but it
>> is
>> > invalid.
>> >
>> > Depending on  the DFDL schema, such data may not unparse successfully.
>> >
>> > Users who test schemas with a simple parse -> unparse process and test
>> data
>> > that has this well-formed-but-invalid behavior may get the impression
>> that
>> > there is a problem with the schema, but really it is just that
>> validation
>> > errors coming out of the parse are not being escalated into true errors.
>> > This behavior of daffodil holds regardless of whether Daffodil is
>> > configured to do validation or not, as validation errors are never parse
>> > errors. They are effectively just warnings. I think this is unintuitive
>> to
>> > many users, who expect the DFDL parse cannot produce invalid XML.
>> >
>> > A test process that does parse -> XSD Validate -> unparse, is correct.
>> The
>> > XSD Validate step in the middle would block such messages as invalid and
>> > they'd never get to the unparser so would not fail in the unparser.
>> >
>> > With that background, should we have an option where at the end of a
>> > Daffodil parse, if there are validation errors we can cause the entire
>> > parse to be considered a failure? This is not the same as escalating
>> > individual validation errors into parse errors as that would affect
>> > backtracking behavior. This is a separate final check once the DFDL
>> Infoset
>> > has been ceated.
>> >
>> > API users of Daffodil can of course inspect output for validation errors
>> > and do this themselves. I just think they are not aware that this is
>> needed.
>> >
>> > Thoughts?
>> >
>> >
>> >
>> > Mike Beckerle
>> > Apache Daffodil PMC | daffodil.apache.org
>> > OGF DFDL Workgroup Co-Chair |
>> www.ogf.org/ogf/doku.php/standards/dfdl/dfdl
>> > Owl Cyber Defense | www.owlcyberdefense.com
>> >
>>
>>

Reply via email to