stevedlawrence commented on code in PR #1122:
URL: https://github.com/apache/daffodil/pull/1122#discussion_r1409308168
##########
daffodil-core/src/main/scala/org/apache/daffodil/core/compiler/Compiler.scala:
##########
@@ -92,13 +89,10 @@ final class ProcessorFactory private (
validateDFDLSchemas,
checkAllTopLevel,
tunables,
- Some(sset),
Review Comment:
> I would like to know more about the use case(s) where someone gets a
ProcessorFactory but abstains from calling isError on purpose. What is the
justification or motivation for such use case(s)?
Users can do this if they are confident they schema will compile without any
errors. The `ProcessorFactory.onPath` function asserts that `isError` is false,
so if a user doesn't call `isError`, then `onPath` will, and all needed work
will be done then. But if `isError` is true, `onPath` will throw an exception
saying this function should not be called if `isError` is true.
Note that this is why some tests don't call `isError`. The test just assumes
there isn't a compile error, and if there is then an assertion exception will
be thrown and the test will fail. It would probably be better to assert that
`isError` is false, but it's not technically required for the test to pass/fail
correctly.
Note that another side effect of this change is that if a user calls
`ProcessorFactory.getDiagnostics` prior to calling `isError`, then no work will
have been done and so there will be no diagnostics. This was actually the case
with the CLI and TDML runner, hence a couple of changes in this PR. We could
avoid this by modifying `getDiagnostics` to call `isError` before returning the
diags.
> Is the justification that a user might want to call various withXXX
methods on the ProcessorFactory to fine tune the ProcessorFactory's parameters
and configuration before actually using the ProcessorFactory to get a Processor?
Yeah, that's why this change was made. With only the change to create a new
`SchemaSet` when `withDistinguishedRootNode` is called, it means if we have
something like this:
```scala
val pf = compiler.compileSource(uri).withDistinguishedRootNode(name,
namespace)
pf.isError
```
Then we create and evaluate two `SchemaSet`s, one when `compileSource` is
called (because it calls `isError`) and the other when `pf.isError` is called.
Note that a user could do something like this:
```scala
val pf = compiler.compileSource(uri)
.withDistinguishedRootNode(name1, namespace1)
.withDistinguishedRootNode(name2, namespace2)
pf.isError
```
And that would still only evaluated two `SchemaSets`, because
`withDistinguishedRootNode` does not call `isError`.
You could kind of think of this API as a builder API. Where `compileSource`
creates a builder, the `withXYZ` functions configure that builder, and
`isError` is the `build()` method, e.g.
```scala
val pf = new ProcessorFactoryBuilder(uri)
.withDistinguishedRootNode(name, namespace)
.build()
```
It wouldn't make any sense to create a `ProcessorFactory` when the builder
is initialized, and then a new one when `.build()` is called. That's
essentially what we were doing before, only we weren't actually creating a new
one which was the bug causing `withDistinguishedRootNode` to be ignored.
> If that is the justification and the motivation for it is compelling
enough (avoid doing a considerable amount of work more than once)
When you call `pf.isError`, it calls the `isError` function on the
`SchemaSet`. And that function has this comment:
```scala
/**
* Asking if there are errors drives compilation. First the schema is
validated, then
* the abstract syntax tree is constructed (which could be a source of
errors)
* and finally the AST objects are checked for errors, which recursively
* demands that all other aspects of compilation occur.
*/
```
That's pretty much everything except asking for a runtime1 parser/unparser,
which is what `pf.onPath()` does. So it is a lot of work that really should be
avoided. We don't want to do all that twice, especially if the first is just
going to be thrown away.
Another option is to drop `withDistinguishedRootNode` entirely, and require
users to provide the root name/namespace to the `compileSource` function. This
is the only `withXYZ` function on the `ProcessorFactory`. I do kindof like the
builder pattern though, makes it easy to add new configurations in the future
without having to modify the new constructor.
> I would say that maybe we should find a better API design
I'm not against this. It's certainly not obvious to the user that .isError
is what triggers the work. It's definitely convenient, but does side effect
things that isn't obvious.
There are a number of changes we've been considering that might warrant a
4.0.0 release (e.g. drop Java 8, new layer API), so if we do want to change our
API, this would be a good time to do it.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]