stevedlawrence commented on code in PR #1515:
URL: https://github.com/apache/daffodil/pull/1515#discussion_r2237376187
##########
daffodil-cli/src/main/scala/org/apache/daffodil/cli/Main.scala:
##########
@@ -184,21 +180,21 @@ class CLIConf(arguments: Array[String], stdout:
PrintStream, stderr: PrintStream
s match {
case DefaultArgPattern(name, arg) if Validators.isRegistered(name) =>
- if (doesNotSupportReloadableParsers(name)) {
- null
+ if (Seq(".conf", ".properties").exists(arg.endsWith)) {
+ val is = new FileInputStream(arg)
+ props.load(is)
} else {
- if (Seq(".conf", ".properties").exists(arg.endsWith)) {
- val is = new FileInputStream(arg)
- props.load(is)
- } else {
- val uss = fileResourceToURI(arg)
- props.setProperty(name, uss.uri.toString)
- }
- Validators.get(name).make(props)
+ val uss = fileResourceToURI(arg)
+ props.setProperty(name, uss.uri.toString)
}
+ Validators.get(name).make(props)
case NoArgsPattern(name) if Validators.isRegistered(name) =>
- if (doesNotSupportReloadableParsers(name)) {
- null
+ val missingRequiredSchema =
+ parser.isDefined && !Seq("limited", "off").exists(_.contains(name))
Review Comment:
I curious if `parser.isDefined` is affected by the order that properties are
provided to scallop. For example, if you do something like this:
```daffodil parse --validate=xerces -P parser.bin ...```
where -P is specified after --validate, does scallop see the `parser` is
defined here, or does it kindof set things in order, and parser won't actually
be defined yet, and so it wouldn't fail?
##########
daffodil-cli/src/main/scala/org/apache/daffodil/cli/Main.scala:
##########
@@ -184,21 +180,21 @@ class CLIConf(arguments: Array[String], stdout:
PrintStream, stderr: PrintStream
s match {
case DefaultArgPattern(name, arg) if Validators.isRegistered(name) =>
- if (doesNotSupportReloadableParsers(name)) {
- null
+ if (Seq(".conf", ".properties").exists(arg.endsWith)) {
+ val is = new FileInputStream(arg)
+ props.load(is)
} else {
- if (Seq(".conf", ".properties").exists(arg.endsWith)) {
- val is = new FileInputStream(arg)
- props.load(is)
- } else {
- val uss = fileResourceToURI(arg)
- props.setProperty(name, uss.uri.toString)
- }
- Validators.get(name).make(props)
+ val uss = fileResourceToURI(arg)
+ props.setProperty(name, uss.uri.toString)
}
+ Validators.get(name).make(props)
case NoArgsPattern(name) if Validators.isRegistered(name) =>
- if (doesNotSupportReloadableParsers(name)) {
- null
+ val missingRequiredSchema =
+ parser.isDefined && !Seq("limited", "off").exists(_.contains(name))
+ if (missingRequiredSchema) {
+ throw new Exception(
+ "To use full validation with a saved parser, the validating
schema must be provided ex: xerces=/path/to/schema.xsd"
Review Comment:
We should remove "full" since this could be a different validator like
schematron or a custom one.
##########
daffodil-cli/src/test/scala/org/apache/daffodil/cli/cliTest/TestCLISaveParser.scala:
##########
@@ -175,12 +175,9 @@ class TestCLISaveParser {
ExitCode.Success
)
- runCLI(args"parse --parser $parser --validate xerces $input") { cli =>
- cli.expectErr("[error]")
- cli.expectErr(
- "The validation name must be 'limited' or 'off' when using a saved
parser."
- )
- }(ExitCode.Usage)
+ runCLI(args"parse --parser $parser --validate xerces $input", debug =
true) { _ => }(
Review Comment:
Should we still expect an error message? We could be a bit more lax in what
we expected instead of expecting the whole error message, but making sure we
get a reasonable error message is probably worth it.
##########
daffodil-core/src/main/scala/org/apache/daffodil/runtime1/processors/DataProcessor.scala:
##########
@@ -197,6 +208,25 @@ class DataProcessor(
*/
override def clone(): DataProcessor = copy()
+ /**
+ * Returns a data processor with all the same state, but the validation name
and validator changed to that of the argument.
+ *
+ * Note that the default validation mode is "off", that is, no validation is
performed.
+ */
+ override def withValidatorName(name: String): api.DataProcessor = {
+ val schemaUriString =
+
optSchemaURI.getOrElse(ssrd.schemaFileLocation.diagnosticFile.toURI).toString
+ val properties = ValidatorFactory.makeConfig(schemaUriString)
+ val newValidator = name match {
+ case `validatorName` =>
+ Validators
+ .get(validatorName)
+ .make(properties)
+ case m => Validators.get(m).make(properties)
+ }
+ withValidator(newValidator)
+ }
Review Comment:
I think this logic, or something similar, wants to be the
DaffodilTDMLDFDLProcessor, not in the normal DataProcessor API.
##########
daffodil-tdml-processor/src/main/scala/org/apache/daffodil/processor/tdml/DaffodilTDMLDFDLProcessor.scala:
##########
@@ -204,6 +202,12 @@ class DaffodilTDMLDFDLProcessor private (private var dp:
api.DataProcessor)
copy(dp = dp.withDebugger(d))
}
+ override def withValidatorName(name: String): DaffodilTDMLDFDLProcessor = {
+ copy(
+ dp = dp.withValidatorName(name)
+ )
Review Comment:
Yeah, I would suggest this doesn't want to do this, but instead calls
`Validator.get.make` and does `dp = dp.withValidator(...)`.
##########
daffodil-cli/src/main/scala/org/apache/daffodil/cli/Main.scala:
##########
@@ -184,21 +180,21 @@ class CLIConf(arguments: Array[String], stdout:
PrintStream, stderr: PrintStream
s match {
case DefaultArgPattern(name, arg) if Validators.isRegistered(name) =>
- if (doesNotSupportReloadableParsers(name)) {
- null
+ if (Seq(".conf", ".properties").exists(arg.endsWith)) {
+ val is = new FileInputStream(arg)
+ props.load(is)
} else {
- if (Seq(".conf", ".properties").exists(arg.endsWith)) {
- val is = new FileInputStream(arg)
- props.load(is)
- } else {
- val uss = fileResourceToURI(arg)
- props.setProperty(name, uss.uri.toString)
- }
- Validators.get(name).make(props)
+ val uss = fileResourceToURI(arg)
+ props.setProperty(name, uss.uri.toString)
}
+ Validators.get(name).make(props)
case NoArgsPattern(name) if Validators.isRegistered(name) =>
- if (doesNotSupportReloadableParsers(name)) {
- null
+ val missingRequiredSchema =
+ parser.isDefined && !Seq("limited", "off").exists(_.contains(name))
Review Comment:
I'm wondering if instead of hardcoding "limited" and "off", we should just
try to make the validator? If we are using a saved parser, we won't have set
`rootSchema` or the validator name property, so creating the validator will
just fail, and it will hopefully fail with a helpful error message (e.g.
"rootSchema or xerces properties must be defined") so we don't need the custom
Exception below.
Note that the "limited" and "off" validators will be created just fine since
they don't require a rootSchema or any properties.
The benefit of this approach is that in theory someone could create a
validator that also doesn't need any special properties, similar to "limited"
or "off", so it can also work just when using a saved parser, instead of only
limiting to "limited" or "off".
##########
daffodil-core/src/main/scala/org/apache/daffodil/core/runtime1/SchemaSetRuntime1Mixin.scala:
##########
@@ -90,7 +90,13 @@ trait SchemaSetRuntime1Mixin {
s"Compiler: component counts: unique ${root.numUniqueComponents},
actual ${root.numComponents}."
)
val dataProc =
- new DataProcessor(ssrd, tunable, variableMap.copy(), diagnostics =
this.diagnostics)
+ new DataProcessor(
+ ssrd,
+ tunable,
+ variableMap.copy(),
+ diagnostics = this.diagnostics,
+ optSchemaURI = Option(schemaSource.uriForLoading)
Review Comment:
I don't think the DataProcessor should store this URI. This makes
reproducible builds a bit more difficult since different systems will have
different URI paths. Note that we don't have fully reproducible builds yet,
but it's a future goal, and this would get in the way of that.
Instead, I think the `uriForLoading` wants to be stored in the
DaffodilTDMLDFDLProcessor somewhere, and then it can be used when the
TDMLRunner calls withValidatorName to create the Validator.
##########
daffodil-core/src/main/scala/org/apache/daffodil/runtime1/processors/DataProcessor.scala:
##########
@@ -104,17 +107,12 @@ object DataProcessor {
private class SerializableDataProcessor(
Review Comment:
Can we remove the SerializableDataProcessor class now? That was only needed
to disallow withValidator on saved parsers, but that is allowed now so this
class doesn't really provided any value anymore.
##########
daffodil-core/src/main/java/org/apache/daffodil/api/DataProcessor.java:
##########
@@ -81,6 +81,16 @@ default DataProcessor withDebuggerRunner(DebuggerRunner dr) {
*/
DataProcessor withDebugger(Debugger dbg);
+ /**
+ * Obtain a new {@link DataProcessor} having a specific validator name.
+ *
+ * @param validatorName Validator name which could be 'off' for NoValidator,
'limited' for DaffodilLimitedValidator,
+ * 'xerces' for XercesValidator, schematron for
SchematronValidator,
+ * or the name of a custom validator that implements
{@link Validator}
+ * @return a new {@link DataProcessor} with a validator with the specified
validator name.
+ */
+ DataProcessor withValidatorName(String validatorName);
+
Review Comment:
I'm not sure this should be partof the DataProcessor API. I think we should
still require that API users create and configure Validators via the new
Validator factory stuff.
I think only the TDMLDFDLProcessor API wants to have withValidatorName. The
DaffodilTDMLDFDLProcessor will implement that API, where it will create a
Validator based off of that name (e.g. "full" becomes
`Validator.get("xerces").make(...)`, and then call `dp =
dp.withValidator(validator)`.
So the TDML still only supports "on", "limited", and "off", and it's up to
TDML implementations to convert those keywords to whatever they internally use
for validation. In the case of IBM DFDL, I think that's just a boolean
(validation is either on or off), but the Daffodil implementations will do more
complex stuff like create a Xerces Validator.
Also, since this API is specific to TDML, I wonder if we just want to call
it `withValidation` to match what it looks like in the TDML files?
##########
daffodil-tdml-lib/src/main/scala/org/apache/daffodil/tdml/TDMLRunner.scala:
##########
@@ -954,11 +940,7 @@ abstract class TestCase(testCaseXML: NodeSeq, val parent:
DFDLTestSuite) {
_.nBits
}
- val useSerializedProcessor =
- if (validatorName == "xerces") false
- else if (defaultValidatorName == "xerces") false
- else if (optExpectedWarnings.isDefined) false
- else true
+ val useSerializedProcessor = optExpectedWarnings.isEmpty
Review Comment:
Do we know why expecting warnings needs to disable serialized processors?
I guess the issue is that saved DataProcessors discard diagnostics? And I
guess the TDML Runner doesn't save those diagnostics?
I wonder if we should also change that as a separate PR? It might add some
extra complication since we now need to store compile diagnostics separately
and recombine them into the parse/unparse diagnostics. Not sure how much it
would be worth it, but it would be nice to remove this exception.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]