I feel like as long as the release notes are clear about how to maintain
compatibility, it's not unreasonable to put this in a 3.11.0 release. We've
changed behavior of many things before (especially when it's related to fixing
non-spec compliant bugs, which technically this is) as long as there were
relatively simple workarounds that were clearly explain in release notes or we
didn't think many schemas relied on the old behavior.
In this case, the release can be something like:
To maintain the previous behavior, change choice branches with discriminators
like this:
<choice>
<sequence ... >
<!-- discriminator -->
<!-- content -->
</sequence>
...
</choice>
To this:
<choice>
<sequence>
<sequence ...>
<!-- content -->
</sequence>
<sequence>
<!-- discriminator -->
</sequence>
</sequence>
...
</choice>
There are usually better ways to rewrite a schema, like using choice dispatch
instead of speculative parsing, but this is a generic and relatively simple
solution for all schemas that do this. We should also add a warning if a
discriminator is placed directly on a choice branch, since usually what a schema
author intended was to evaluate the discriminator at the beginning of the branch
and it should simply be wrapped in a sequence.
Though, I'm not against a toggle flag if we really want to maintain backwards
compatibility with the old broken behavior, and then 4.0.0 can toggle the switch
to the correct behavior.
If we still think this requires a major version bump, I would suggest we just
wait until 4.0.0 with the API, scala 3, and other large breaking changes. Doing
a single 4.0.0 release for this and then another major version bump to 5.0.0 for
the API and scala 3 changes seems unnecessary. I also don't know of any schemas
really waiting on this fix, so we don't really need to rush it. We're just
trying to merge this since it's been in the merge queue for a long time.
On 2025-05-06 12:42 PM, Mike Beckerle wrote:
Plan on a rather large release-note section, perhaps even a whole
daffodil-site page of its own to discuss the changes required to schemas
due to this bug-fix.
This will need a substantial worked example, discussion of the
false-positives problem where malformed data might be masked if a branch
which should have been positively discriminated instead backtracks and a
subsequent branch of that choice is somehow satisfied by default, because
the schema was written with assumption that the discriminator would be
evaluated, but it wasn't. This is a pretty big deal.
Since this is an incompatible breaking change we have to think about
version numbering here. Last chatter I heard we are planning for a 3.11.0,
then 4.0.0.
But 3.11 is just scala version change, and 4.0.0 is Scala 3 and required
API changes, but neither of these have any a functional change to schemas
and DFDL language behavior. (I think. Correct me if I am wrong.)
Does this incompatible breaking change require us to then create a release
5.0.0 or do we want to muddy the waters of 4.0.0 by adding this functional
change in as well?
Also, we should think about if we want to have a switch on the fix for this
behavior change which defaults to "fixed" behavior, but the release note
would tell people to either fix their schemas or turn on the "beforeFix"
behavior with the switch. Or we could introduce this fix with the switch
set to "beforeFix" behavior, but then flip the default when we're ready to
release daffodil version 5.0.0?
On Tue, May 6, 2025 at 11:57 AM Adams, Joshua
<jad...@owlcyberdefense.com.invalid> wrote:
https://issues.apache.org/jira/browse/DAFFODIL-1971
I believe that I have the work for this pretty much wrapped up in this PR
(other than rebasing onto main) -
https://github.com/apache/daffodil/pull/987
Before finishing that, I figured it would be worth discussing how these
changes can break existing schemas. Schemas that have been using
discriminator expressions with speculative choices can be particularly
effected as discriminator expressions are now evaluated regardless of the
success of parsing the rest of the choice branch. Prior to this change the
discriminator expression would only be evaluated if the rest of the choice
branch parsed successfully.
3 out of the ~80 DFDL schema projects we have in our regression test suite
are effected by this change.
*
GIF
*
This schema breaks because it uses discriminator expressions to test if
Byte elements match some expected value. Prior to this change the
expression would only run if the rest of the choice branch successfully
parsed and the Byte elements checked by the expression would match. With
this change a speculative choice branch will fail to parse but the
discriminator expression will still be evaluated and fail to match the Byte
elements.
*
Suggested fix: Use choice dispatch based on these expected Byte elements
and remove speculative parsing and discriminators altogether.
*
https://github.com/DFDLSchemas/GIF/pull/8/commits/afb10382514753c5dbe6fb69f77e0ac6da6505ab
*
Edifact
*
This schema used discriminator expressions that always evaluated to true.
Prior to this change this expression only evaluated when the rest of the
choice branch parsed successfully. With this change the expression will be
run even if the choice branch fails, causing obvious issues.
*
Suggested fix: Use a nested sequence to contain the choice branch body
followed by another nested sequence for the discriminator expression. This
will cause the discriminator expression to only be ran if the the first
nested sequence containing the choice branch body parses.
*
https://github.com/DFDLSchemas/EDIFACT/pull/9/commits/476fe78e424e01580d5caf302a87aaaf703c609b
*
JPEG2000 (Internal schema)
*
This schema also uses discriminator expressions to check if a ID element
matches expected values
*
Suggested fix: Use choice dispatch based on the ID element and remove
speculative parsing and discriminators altogether.
With these suggested changes all schemas in our regression suite pass with
the changes for DAFFODIL-1971. I'd like to get this merged into the next
release if possible now that I have confirmed viable work-arounds for at
least the problematic schemas that we are aware of.
Thoughts?