Re: [PR] Fix ProcessorFactory.withDistinguishedRootNode and isError [daffodil]

via GitHub Wed, 29 Nov 2023 05:48:25 -0800


stevedlawrence commented on code in PR #1122:
URL: https://github.com/apache/daffodil/pull/1122#discussion_r1409308168



##########
daffodil-core/src/main/scala/org/apache/daffodil/core/compiler/Compiler.scala:
##########
@@ -92,13 +89,10 @@ final class ProcessorFactory private (
       validateDFDLSchemas,
       checkAllTopLevel,
       tunables,
-      Some(sset),

Review Comment:
   > I would like to know more about the use case(s) where someone gets a 
ProcessorFactory but abstains from calling isError on purpose. What is the 
justification or motivation for such use case(s)?
   
   Users can do this if they are confident they schema will compile without any 
errors. The `ProcessorFactory.onPath` function asserts that `isError` is false, 
so if a user doesn't call `isError`, then `onPath` will, and all needed work 
will be done then. But if `isError` is true, `onPath` will throw an exception 
saying this function should not be called if `isError` is true.
   
   Note that this is why some tests don't call `isError`. The test just assumes 
there isn't a compile error, and if there is then an assertion exception will 
be thrown and the test will fail. It would probably be better to assert that 
`isError` is false, but it's not technically required for the test to pass/fail 
correctly.
   
   Note that another side effect of this change is that if a user calls 
`ProcessorFactory.getDiagnostics` prior to calling `isError`, then no work will 
have been done and so there will be no diagnostics. This was actually the case 
with the CLI and TDML runner, hence a couple of changes in this PR. We could 
avoid this by modifying `getDiagnostics` to call `isError` before returning the 
diags.
   
   > Is the justification that a user might want to call various withXXX 
methods on the ProcessorFactory to fine tune the ProcessorFactory's parameters 
and configuration before actually using the ProcessorFactory to get a Processor?
   
   Yeah, that's why this change was made. With only the change to create a new 
`SchemaSet` when `withDistinguishedRootNode` is called, it means if we have 
something like this:
   
   ```scala
   val pf = compiler.compileSource(uri).withDistinguishedRootNode(name, 
namespace)
   pf.isError
   ```
   Then we create and evaluate two `SchemaSet`s, one when `compileSource` is 
called (because it calls `isError`) and the other when `pf.isError` is called. 
Note that a user could do something like this:
   
   ```scala
   val pf = compiler.compileSource(uri)
              .withDistinguishedRootNode(name1, namespace1)
              .withDistinguishedRootNode(name2, namespace2)
   pf.isError
   ```
   
   And that would still only evaluated two `SchemaSets`, because 
`withDistinguishedRootNode` does not call `isError`.
   
   You could kind of think of this API as a builder API. Where `compileSource` 
creates a builder, the `withXYZ` functions configure that builder, and 
`isError` is the `build()` method, e.g.
   
   ```scala
   val pf = new ProcessorFactoryBuilder(uri)
              .withDistinguishedRootNode(name, namespace)
              .build()
   ```
   
   It wouldn't make any sense to create a `ProcessorFactory` when the builder 
is initialized, and then a new one when `.build()` is called. That's 
essentially what we were doing before, only we weren't actually creating a new 
one which was the bug causing `withDistinguishedRootNode` to be ignored.
   
   > If that is the justification and the motivation for it is compelling 
enough (avoid doing a considerable amount of work more than once)
   
   When you call `pf.isError`, it calls the `isError` function on the 
`SchemaSet`. And that function has this comment:
   ```scala
     /**
      * Asking if there are errors drives compilation. First the schema is 
validated, then
      * the abstract syntax tree is constructed (which could be a source of 
errors)
      * and finally the AST objects are checked for errors, which recursively
      * demands that all other aspects of compilation occur.
      */
   ```
   That's pretty much everything except asking for a runtime1 parser/unparser, 
which is what `pf.onPath()` does. So it is a lot of work that really should be 
avoided. We don't want to do all that twice, especially if the first is just 
going to be thrown away.
   
   Another option is to drop `withDistinguishedRootNode` entirely, and require 
users to provide the root name/namespace to the `compileSource` function. This 
is the only `withXYZ` function on the `ProcessorFactory`. I do kindof like the 
builder pattern though, makes it easy to add new configurations in the future 
without having to modify the new constructor.
   
   > I would say that maybe we should find a better API design
   
   I'm not against this. It's certainly not obvious to the user that .isError 
is what triggers the work. It's definitely convenient, but does side effect 
things that isn't obvious.
   
   There are a number of changes we've been considering that might warrant a 
4.0.0 release (e.g. drop Java 8, new layer API), so if we do want to change our 
API, this would be a good time to do it.
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] Fix ProcessorFactory.withDistinguishedRootNode and isError [daffodil]

Reply via email to