Plan on a rather large release-note section, perhaps even a whole daffodil-site page of its own to discuss the changes required to schemas due to this bug-fix.
This will need a substantial worked example, discussion of the false-positives problem where malformed data might be masked if a branch which should have been positively discriminated instead backtracks and a subsequent branch of that choice is somehow satisfied by default, because the schema was written with assumption that the discriminator would be evaluated, but it wasn't. This is a pretty big deal. Since this is an incompatible breaking change we have to think about version numbering here. Last chatter I heard we are planning for a 3.11.0, then 4.0.0. But 3.11 is just scala version change, and 4.0.0 is Scala 3 and required API changes, but neither of these have any a functional change to schemas and DFDL language behavior. (I think. Correct me if I am wrong.) Does this incompatible breaking change require us to then create a release 5.0.0 or do we want to muddy the waters of 4.0.0 by adding this functional change in as well? Also, we should think about if we want to have a switch on the fix for this behavior change which defaults to "fixed" behavior, but the release note would tell people to either fix their schemas or turn on the "beforeFix" behavior with the switch. Or we could introduce this fix with the switch set to "beforeFix" behavior, but then flip the default when we're ready to release daffodil version 5.0.0? On Tue, May 6, 2025 at 11:57 AM Adams, Joshua <jad...@owlcyberdefense.com.invalid> wrote: > https://issues.apache.org/jira/browse/DAFFODIL-1971 > > I believe that I have the work for this pretty much wrapped up in this PR > (other than rebasing onto main) - > https://github.com/apache/daffodil/pull/987 > > Before finishing that, I figured it would be worth discussing how these > changes can break existing schemas. Schemas that have been using > discriminator expressions with speculative choices can be particularly > effected as discriminator expressions are now evaluated regardless of the > success of parsing the rest of the choice branch. Prior to this change the > discriminator expression would only be evaluated if the rest of the choice > branch parsed successfully. > > 3 out of the ~80 DFDL schema projects we have in our regression test suite > are effected by this change. > > > * > GIF > * > This schema breaks because it uses discriminator expressions to test if > Byte elements match some expected value. Prior to this change the > expression would only run if the rest of the choice branch successfully > parsed and the Byte elements checked by the expression would match. With > this change a speculative choice branch will fail to parse but the > discriminator expression will still be evaluated and fail to match the Byte > elements. > * > Suggested fix: Use choice dispatch based on these expected Byte elements > and remove speculative parsing and discriminators altogether. > * > > https://github.com/DFDLSchemas/GIF/pull/8/commits/afb10382514753c5dbe6fb69f77e0ac6da6505ab > > > * > Edifact > * > This schema used discriminator expressions that always evaluated to true. > Prior to this change this expression only evaluated when the rest of the > choice branch parsed successfully. With this change the expression will be > run even if the choice branch fails, causing obvious issues. > * > Suggested fix: Use a nested sequence to contain the choice branch body > followed by another nested sequence for the discriminator expression. This > will cause the discriminator expression to only be ran if the the first > nested sequence containing the choice branch body parses. > * > > https://github.com/DFDLSchemas/EDIFACT/pull/9/commits/476fe78e424e01580d5caf302a87aaaf703c609b > > > * > JPEG2000 (Internal schema) > * > This schema also uses discriminator expressions to check if a ID element > matches expected values > * > Suggested fix: Use choice dispatch based on the ID element and remove > speculative parsing and discriminators altogether. > > With these suggested changes all schemas in our regression suite pass with > the changes for DAFFODIL-1971. I'd like to get this merged into the next > release if possible now that I have confirmed viable work-arounds for at > least the problematic schemas that we are aware of. > > Thoughts? >