stevedlawrence commented on code in PR #1515:
URL: https://github.com/apache/daffodil/pull/1515#discussion_r2237376187


##########
daffodil-cli/src/main/scala/org/apache/daffodil/cli/Main.scala:
##########
@@ -184,21 +180,21 @@ class CLIConf(arguments: Array[String], stdout: 
PrintStream, stderr: PrintStream
 
       s match {
         case DefaultArgPattern(name, arg) if Validators.isRegistered(name) =>
-          if (doesNotSupportReloadableParsers(name)) {
-            null
+          if (Seq(".conf", ".properties").exists(arg.endsWith)) {
+            val is = new FileInputStream(arg)
+            props.load(is)
           } else {
-            if (Seq(".conf", ".properties").exists(arg.endsWith)) {
-              val is = new FileInputStream(arg)
-              props.load(is)
-            } else {
-              val uss = fileResourceToURI(arg)
-              props.setProperty(name, uss.uri.toString)
-            }
-            Validators.get(name).make(props)
+            val uss = fileResourceToURI(arg)
+            props.setProperty(name, uss.uri.toString)
           }
+          Validators.get(name).make(props)
         case NoArgsPattern(name) if Validators.isRegistered(name) =>
-          if (doesNotSupportReloadableParsers(name)) {
-            null
+          val missingRequiredSchema =
+            parser.isDefined && !Seq("limited", "off").exists(_.contains(name))

Review Comment:
   I curious if `parser.isDefined` is affected by the order that properties are 
provided to scallop. For example, if you do something like this:
   
   ```daffodil parse --validate=xerces -P parser.bin ...```
   
   where -P is specified after --validate, does scallop see the `parser` is 
defined here, or does it kindof set things in order, and parser won't actually 
be defined yet, and so it wouldn't fail?



##########
daffodil-cli/src/main/scala/org/apache/daffodil/cli/Main.scala:
##########
@@ -184,21 +180,21 @@ class CLIConf(arguments: Array[String], stdout: 
PrintStream, stderr: PrintStream
 
       s match {
         case DefaultArgPattern(name, arg) if Validators.isRegistered(name) =>
-          if (doesNotSupportReloadableParsers(name)) {
-            null
+          if (Seq(".conf", ".properties").exists(arg.endsWith)) {
+            val is = new FileInputStream(arg)
+            props.load(is)
           } else {
-            if (Seq(".conf", ".properties").exists(arg.endsWith)) {
-              val is = new FileInputStream(arg)
-              props.load(is)
-            } else {
-              val uss = fileResourceToURI(arg)
-              props.setProperty(name, uss.uri.toString)
-            }
-            Validators.get(name).make(props)
+            val uss = fileResourceToURI(arg)
+            props.setProperty(name, uss.uri.toString)
           }
+          Validators.get(name).make(props)
         case NoArgsPattern(name) if Validators.isRegistered(name) =>
-          if (doesNotSupportReloadableParsers(name)) {
-            null
+          val missingRequiredSchema =
+            parser.isDefined && !Seq("limited", "off").exists(_.contains(name))
+          if (missingRequiredSchema) {
+            throw new Exception(
+              "To use full validation with a saved parser, the validating 
schema must be provided ex: xerces=/path/to/schema.xsd"

Review Comment:
   We should remove "full" since this could be a different validator like 
schematron or a custom one.



##########
daffodil-cli/src/test/scala/org/apache/daffodil/cli/cliTest/TestCLISaveParser.scala:
##########
@@ -175,12 +175,9 @@ class TestCLISaveParser {
         ExitCode.Success
       )
 
-      runCLI(args"parse --parser $parser --validate xerces $input") { cli =>
-        cli.expectErr("[error]")
-        cli.expectErr(
-          "The validation name must be 'limited' or 'off' when using a saved 
parser."
-        )
-      }(ExitCode.Usage)
+      runCLI(args"parse --parser $parser --validate xerces $input", debug = 
true) { _ => }(

Review Comment:
   Should we still expect an error message? We could be a bit more lax in what 
we expected instead of expecting the whole error message, but making sure we 
get a reasonable error message is probably worth it.



##########
daffodil-core/src/main/scala/org/apache/daffodil/runtime1/processors/DataProcessor.scala:
##########
@@ -197,6 +208,25 @@ class DataProcessor(
    */
   override def clone(): DataProcessor = copy()
 
+  /**
+   * Returns a data processor with all the same state, but the validation name 
and validator changed to that of the argument.
+   *
+   * Note that the default validation mode is "off", that is, no validation is 
performed.
+   */
+  override def withValidatorName(name: String): api.DataProcessor = {
+    val schemaUriString =
+      
optSchemaURI.getOrElse(ssrd.schemaFileLocation.diagnosticFile.toURI).toString
+    val properties = ValidatorFactory.makeConfig(schemaUriString)
+    val newValidator = name match {
+      case `validatorName` =>
+        Validators
+          .get(validatorName)
+          .make(properties)
+      case m => Validators.get(m).make(properties)
+    }
+    withValidator(newValidator)
+  }

Review Comment:
   I think this logic, or something similar, wants to be the 
DaffodilTDMLDFDLProcessor, not in the normal DataProcessor API.



##########
daffodil-tdml-processor/src/main/scala/org/apache/daffodil/processor/tdml/DaffodilTDMLDFDLProcessor.scala:
##########
@@ -204,6 +202,12 @@ class DaffodilTDMLDFDLProcessor private (private var dp: 
api.DataProcessor)
     copy(dp = dp.withDebugger(d))
   }
 
+  override def withValidatorName(name: String): DaffodilTDMLDFDLProcessor = {
+    copy(
+      dp = dp.withValidatorName(name)
+    )

Review Comment:
   Yeah, I would suggest this doesn't want to do this, but instead calls 
`Validator.get.make` and does `dp = dp.withValidator(...)`.
   



##########
daffodil-cli/src/main/scala/org/apache/daffodil/cli/Main.scala:
##########
@@ -184,21 +180,21 @@ class CLIConf(arguments: Array[String], stdout: 
PrintStream, stderr: PrintStream
 
       s match {
         case DefaultArgPattern(name, arg) if Validators.isRegistered(name) =>
-          if (doesNotSupportReloadableParsers(name)) {
-            null
+          if (Seq(".conf", ".properties").exists(arg.endsWith)) {
+            val is = new FileInputStream(arg)
+            props.load(is)
           } else {
-            if (Seq(".conf", ".properties").exists(arg.endsWith)) {
-              val is = new FileInputStream(arg)
-              props.load(is)
-            } else {
-              val uss = fileResourceToURI(arg)
-              props.setProperty(name, uss.uri.toString)
-            }
-            Validators.get(name).make(props)
+            val uss = fileResourceToURI(arg)
+            props.setProperty(name, uss.uri.toString)
           }
+          Validators.get(name).make(props)
         case NoArgsPattern(name) if Validators.isRegistered(name) =>
-          if (doesNotSupportReloadableParsers(name)) {
-            null
+          val missingRequiredSchema =
+            parser.isDefined && !Seq("limited", "off").exists(_.contains(name))

Review Comment:
   I'm wondering if instead of hardcoding "limited" and "off", we should just 
try to make the validator? If we are using a saved parser, we won't have set 
`rootSchema` or the validator name property, so creating the validator will 
just fail, and it will hopefully fail with a helpful error message (e.g. 
"rootSchema or xerces properties must be defined") so we don't need the custom 
Exception below.
   
   Note that the "limited" and "off" validators will be created just fine since 
they don't require a rootSchema or any properties.
   
   The benefit of this approach is that in theory someone could create a 
validator that also doesn't need any special properties, similar to "limited" 
or "off", so it can also work just when using a saved parser, instead of only 
limiting to "limited" or "off".



##########
daffodil-core/src/main/scala/org/apache/daffodil/core/runtime1/SchemaSetRuntime1Mixin.scala:
##########
@@ -90,7 +90,13 @@ trait SchemaSetRuntime1Mixin {
         s"Compiler: component counts: unique ${root.numUniqueComponents}, 
actual ${root.numComponents}."
       )
     val dataProc =
-      new DataProcessor(ssrd, tunable, variableMap.copy(), diagnostics = 
this.diagnostics)
+      new DataProcessor(
+        ssrd,
+        tunable,
+        variableMap.copy(),
+        diagnostics = this.diagnostics,
+        optSchemaURI = Option(schemaSource.uriForLoading)

Review Comment:
   I don't think the DataProcessor should store this URI. This makes 
reproducible builds a bit more difficult since different systems will have 
different  URI paths. Note that we don't have fully reproducible builds yet, 
but it's a future goal, and this would get in the way of that.
   
   Instead, I think the `uriForLoading` wants to be stored in the 
DaffodilTDMLDFDLProcessor somewhere, and then it can be used when the 
TDMLRunner calls withValidatorName to create the Validator.



##########
daffodil-core/src/main/scala/org/apache/daffodil/runtime1/processors/DataProcessor.scala:
##########
@@ -104,17 +107,12 @@ object DataProcessor {
   private class SerializableDataProcessor(

Review Comment:
   Can we remove the SerializableDataProcessor class now? That was only needed 
to disallow withValidator on saved parsers, but that is allowed now so this 
class doesn't really provided any value anymore.



##########
daffodil-core/src/main/java/org/apache/daffodil/api/DataProcessor.java:
##########
@@ -81,6 +81,16 @@ default DataProcessor withDebuggerRunner(DebuggerRunner dr) {
    */
   DataProcessor withDebugger(Debugger dbg);
 
+  /**
+   * Obtain a new {@link DataProcessor} having a specific validator name.
+   *
+   * @param validatorName Validator name which could be 'off' for NoValidator, 
'limited' for DaffodilLimitedValidator,
+   *                      'xerces' for XercesValidator, schematron for 
SchematronValidator,
+   *                      or the name of a custom validator that implements 
{@link Validator}
+   * @return a new {@link DataProcessor} with a validator with the specified 
validator name.
+   */
+  DataProcessor withValidatorName(String validatorName);
+

Review Comment:
   I'm not sure this should be partof the DataProcessor API. I think we should 
still require that API users create and configure Validators via the new 
Validator factory stuff.
   
   I think only the TDMLDFDLProcessor API wants to have withValidatorName. The 
DaffodilTDMLDFDLProcessor will implement that API, where it will create a 
Validator based off of that name (e.g. "full" becomes 
`Validator.get("xerces").make(...)`, and then call `dp = 
dp.withValidator(validator)`.
   
   So the TDML still only supports "on", "limited", and "off", and it's up to 
TDML implementations to convert those keywords to whatever they internally use 
for validation. In the case of IBM DFDL, I think that's just a boolean 
(validation is either on or off), but the Daffodil implementations will do more 
complex stuff like create a Xerces Validator.
   
   Also, since this API is specific to TDML, I wonder if we just want to call 
it `withValidation` to match what it looks like in the TDML files?



##########
daffodil-tdml-lib/src/main/scala/org/apache/daffodil/tdml/TDMLRunner.scala:
##########
@@ -954,11 +940,7 @@ abstract class TestCase(testCaseXML: NodeSeq, val parent: 
DFDLTestSuite) {
         _.nBits
       }
 
-      val useSerializedProcessor =
-        if (validatorName == "xerces") false
-        else if (defaultValidatorName == "xerces") false
-        else if (optExpectedWarnings.isDefined) false
-        else true
+      val useSerializedProcessor = optExpectedWarnings.isEmpty

Review Comment:
   Do we know why expecting warnings needs to disable serialized processors?
   
   I guess the issue is that saved DataProcessors discard diagnostics? And I 
guess the TDML Runner doesn't save those diagnostics?
   
   I wonder if we should also change that as a separate PR? It might add some 
extra complication since we now need to store compile diagnostics separately 
and recombine them into the parse/unparse diagnostics. Not sure how much it 
would be worth it, but it would be nice to remove this exception.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to