PR is in at https://github.com/apache/incubator-daffodil/pull/431
On Wed, Oct 7, 2020 at 9:43 AM john wass <jwa...@gmail.com> wrote: > Based on the feedback it sounds like the approach is sane enough to put > together a PR. > > Thanks all for the reviews and feedback. > > > On Thu, Oct 1, 2020 at 6:11 PM Beckerle, Mike < > mbecke...@owlcyberdefense.com> wrote: > >> FYI: John Wass - I am also going to surf your code a bit so may have more >> comments. I've fetched from your ctc-oss repositories. >> >> One thing the UDF code did work through is how to define a SPI-based >> feature for Daffodil and also include test-specific instances of it and >> test them, all in the daffodil source tree. >> >> ________________________________ >> From: Beckerle, Mike <mbecke...@owlcyberdefense.com> >> Sent: Thursday, October 1, 2020 5:52 PM >> To: dev@daffodil.apache.org <dev@daffodil.apache.org> >> Subject: Re: Validator SPI proposal >> >> A few thoughts on top of John Interrante's review. >> >> The validator code/clases being found via SPI seems good. Sharing >> code/library with the existing usage for UDFs would be nice if it works out. >> >> The validator code reads in various "specs". >> >> For XML Schema validation with xerces, this is the XML Schema (which is >> also the DFDL schema). >> >> For Schematron validation, I know some people have asked for the ability >> to express the schematron rules on the DFDL schema as added schema >> annotation elements, positioning them on elements and having the "." path >> expression refer to the element corresonding to the element declaration >> upon which the rule is placed. They end up looking somewhat like DFDL's >> assertions, but the schematron rules use full XPath, and so can do somewhat >> more, and they are operating on the XML Infoset, not the DFDL Infoset. >> >> But regardless of whether the schematron rules are extracted from the >> DFDL schema or from another file, the schematron validator, just like >> xerces, effectively has to compile that "spec" information into an internal >> data structure that enables fast validation. >> >> So a requirement is that this happens once only, at startup time >> regardless of how many times parse/unparse are called. >> >> Ideally, one would be able to serialize the result of this compilation >> i.e., save and serialize the validator so that it needn't be recompiled at >> all if reloaded. If this compiled validator is serializable, then just >> making that value a member of the SchemaSetRuntimeData class should do it, >> as that object and all its members get serialized now. >> >> So if possible the validator API should accommodate this >> compile/save/reload cycle. >> >> btw: daffodil has validation options for parse, but not for unparse >> currently. It should have the option to validate the incoming infoset >> before unparsing as well. >> >> Re: Your "unknowns" >> >> - How to approach breaking changes in the Validator API >> >> This is a general issue with Daffodil APIs. I think we have previously >> adopted a posture of that we would support API change by retaining existing >> but deprecated APIs for a release or two before phasing them out. We try to >> sort these out in design discussions of APIs or in Pull-Request reviews >> that have API changes in them. >> >> - How to evolve serialized API objects to prevent breakage in existing >> serialized objects (specifically from daffodil.api.ValidationMode) >> >> We have heretofore punted this in Daffodil generally. Saved >> parser/unparsers are version specific. If we want to fix this we should use >> a general approach for all Daffodil's serialized objects. You are >> proposing to change ValidationMode so that really, it's not an enum any >> more, it can use identifiers that are pulled from classpath/SPI objects >> found. >> >> In that case the code that uses ValidationMode will have to change to >> use something more general. Probably ValidationMode itself has to go away >> as a concept replaced by a ValidatorSpec class which can be constructed >> from a string. >> >> Then maybe ValidationMode.on isn't an enum at all any more, but a method >> that returns a singleton ValidationSpec for the xerces built in validator? >> >> - Is there a better overall approach to this :P >> >> Gotta start somewhere. >> >> -mikeb >> >> ________________________________ >> From: Wass, John L <wa...@ctc.com> >> Sent: Wednesday, September 30, 2020 3:19 PM >> To: dev@daffodil.apache.org <dev@daffodil.apache.org> >> Subject: Re: Validator SPI proposal >> >> Thanks for the review John. >> >> > Then how do you combine your forked Daffodil and sample schemetron >> implementation/application together so that your simplest usage example >> actually works? >> >> Sure. The missing instructions are below and they were also added to the >> readme in the sample app repo. >> >> --- >> >> 1. From the root of daffodil; stage the cli package >> `sbt daffodil-cli/universal:stage` >> >> 2. From `daffodil-cli/target/universal/stage`; run the application, >> verifying it fails as expected due to missing schematron validator jar >> `./bin/daffodil parse --schema $data_dir/bmp.dfdl.xsd --validate >> sch=$data_dir/bmp.sch $data_dir/MARBLES.BMP` >> Should result in >> `[error] Bad arguments for option 'validate': 'sch=/sample/data/bmp.sch' >> - Unrecognized ValidationMode sch=/sample/data/bmp.sch. Must be 'on', >> 'limited', 'off', or name of spi validator.` >> >> 3. From the root of schematron validator; create an assembly jar >> `sbt assembly` >> >> 4. From `daffodil-schematron-validator/target/scala-2.12`; copy the >> validator jar to the staged daffodil-cli application lib dir >> `daffodil-cli/target/universal/stage/lib` >> >> 5. From `daffodil-cli/target/universal/stage`; run the application >> `./bin/daffodil parse --schema $data_dir/bmp.dfdl.xsd --validate >> sch=$data_dir/bmp.sch $data_dir/MARBLES.BMP` >> >> 6. See the parsed BMP with schematron validation status dumped to stdout. >> >> >> Note the exported path to the schematron validator data dir as `data_dir` >> in the examples above. >> >> >> --- >> >> >> > I'd have to do some research before I could say something about your >> bullet items for discussion: >> >> Thanks. I am actively thinking about these and welcome input. Will >> follow up with additional thoughts... >> >> >> > FYI, Daffodil already uses the ServiceLoader API to load user defined >> functions (daffodil-udf) >> >> Excellent. I did search for SPI related things when first looking at this >> but somehow missed that implementation. I'll review it. >> >> >> Appreciate the feedback, and looking forward to hearing if the app runs >> for you :) >> >> >> john >> >> >> ________________________________ >> From: Interrante, John A (GE Research, US) <inter...@research.ge.com> >> Sent: Tuesday, September 29, 2020 5:34 PM >> To: dev@daffodil.apache.org >> Subject: RE: Validator SPI proposal >> >> Hello John, >> >> Using ServiceLoader looks reasonable. I looked at your reference >> implementation and sample application, but can you clarify a question for >> me? First you build your forked Daffodil and your sample application >> separately in different directories. Then how do you combine your forked >> Daffodil and sample schemetron implementation/application together so that >> your simplest usage example actually works? That is, do you need to do >> step 1 below? >> >> 1. Copy a jar from daffodil-schematron-validator/target/... to >> incubator-daffodil/daffodil-cli/target/universal/stage/lib? >> $ <please fill in this step> >> 2. Define an alias (or create a symbolic link) to allow you to run your >> freshly built daffodil executable? >> $ alias >> daffodil="$HOME/incubator-daffodil/daffodil-cli/target/universal/stage/bin/daffodil" >> 3. Run your simplest usage example? >> $ cd daffodil-schematron-validator >> $ daffodil parse --schema data/bmp.dfdl.xsd --validate >> sch=data/bmp.sch data/MARBLES.BMP >> >> I'd have to do some research before I could say something about your >> bullet items for discussion: >> >> - How to approach breaking changes in the Validator API >> - How to evolve serialized API objects to prevent breakage in existing >> serialized objects (specifically from daffodil.api.ValidationMode) >> - Is there a better overall approach to this >> >> FYI, Daffodil already uses the ServiceLoader API to load user defined >> functions (daffodil-udf). I don't know much about the UDF files; I found >> them only because I searched for any occurrences of ServiceLoader in >> Daffodil. I don't know if you have seen these files and whether any of >> them informed your implementation, but I'll append a list of the UDF files >> for you to look at. >> >> interran@GH3WPL13E:~/apache/incubator-daffodil-asf$ fd udf >> daffodil-cli/src/it/scala/org/apache/daffodil/udf >> daffodil-cli/src/it/scala/org/apache/daffodil/udf/TestCLIUdfs.scala >> daffodil-runtime1/src/main/scala/org/apache/daffodil/udf >> >> daffodil-test/src/test/resources/META-INF/services/org.apache.daffodil.udf.UserDefinedFunctionProvider >> daffodil-test/src/test/resources/org/apache/daffodil/udf >> daffodil-test/src/test/resources/org/apache/daffodil/udf/udfs.tdml >> daffodil-test/src/test/scala/org/apache/daffodil/udf >> >> daffodil-test/src/test/scala/org/apache/daffodil/udf/TestUdfsInSchemas.scala >> daffodil-udf >> daffodil-udf/src/main/java/org/apache/daffodil/udf >> daffodil-udf/src/test/java/org/badudfs >> >> daffodil-udf/src/test/java/org/badudfs/annotations/StringFunctions/META-INF/services/org.apache.daffodil.udf.UserDefinedFunctionProvider >> >> daffodil-udf/src/test/java/org/badudfs/evaluate/StringFunctions/META-INF/services/org.apache.daffodil.udf.UserDefinedFunctionProvider >> >> daffodil-udf/src/test/java/org/badudfs/functionclasses1/StringFunctions/META-INF/services/org.apache.daffodil.udf.UserDefinedFunctionProvider >> >> daffodil-udf/src/test/java/org/badudfs/functionclasses2/StringFunctions/META-INF/services/org.apache.daffodil.udf.UserDefinedFunctionProvider >> daffodil-udf/src/test/java/org/badudfs/nonUDF >> >> daffodil-udf/src/test/java/org/badudfs/nonUDF/StringFunctions/META-INF/services/org.apache.daffodil.udf.UserDefinedFunctionProvider >> daffodil-udf/src/test/java/org/jgoodudfs >> daffodil-udf/src/test/resources/org/apache/daffodil/udf >> >> daffodil-udf/src/test/resources/org/apache/daffodil/udf/genericUdfSchema.xsd >> >> daffodil-udf/src/test/resources/org/badmetainf/nonexistentclass/META-INF/services/org.apache.daffodil.udf.UserDefinedFunctionProvider >> >> daffodil-udf/src/test/resources/org/goodmetainf/IntegerFunctions/META-INF/services/org.apache.daffodil.udf.UserDefinedFunctionProvider >> >> daffodil-udf/src/test/resources/org/goodmetainf/StringFunctions/META-INF/services/org.apache.daffodil.udf.UserDefinedFunctionProvider >> daffodil-udf/src/test/scala/org/sbadudfs >> >> daffodil-udf/src/test/scala/org/sbadudfs/functionclasses/StringFunctions/META-INF/services/org.apache.daffodil.udf.UserDefinedFunctionProvider >> >> daffodil-udf/src/test/scala/org/sbadudfs/functionclasses2/StringFunctions/META-INF/services/org.apache.daffodil.udf.UserDefinedFunctionProvider >> daffodil-udf/src/test/scala/org/sbadudfs/udfexceptions >> >> daffodil-udf/src/test/scala/org/sbadudfs/udfexceptions/evaluating/StringFunctions/META-INF/services/org.apache.daffodil.udf.UserDefinedFunctionProvider >> daffodil-udf/src/test/scala/org/sbadudfs/udfexceptions2 >> >> daffodil-udf/src/test/scala/org/sbadudfs/udfexceptions2/StringFunctions/META-INF/services/org.apache.daffodil.udf.UserDefinedFunctionProvider >> daffodil-udf/src/test/scala/org/sbadudfs/udfpexceptions >> >> daffodil-udf/src/test/scala/org/sbadudfs/udfpexceptions/StringFunctions/META-INF/services/org.apache.daffodil.udf.UserDefinedFunctionProvider >> daffodil-udf/src/test/scala/org/sbadudfs/udfpexceptions2 >> >> daffodil-udf/src/test/scala/org/sbadudfs/udfpexceptions2/StringFunctions/META-INF/services/org.apache.daffodil.udf.UserDefinedFunctionProvider >> daffodil-udf/src/test/scala/org/sgoodudfs >> interran@GH3WPL13E:~/apache/incubator-daffodil-asf$ rg ServiceLoader >> daffodil-udf/README.md >> 36:This class will act as a traditional service provider as explained in >> the ServiceLoader API, and must have a >> *META-INF/services/org.apache.daffodil.udf.UserDefinedFunctionProvider* >> file in its project. This file must contain the fully qualified name(s) of >> the **provider class(es)** in the JAR. Without that file, neither this >> class nor any of the User Defined Function classes it provides will be >> visible to Daffodil. >> 86:Each UDF is registered by including the fully qualified name of its >> provider in a text file named >> `META-INF/services/org.apache.daffodil.udf.UserDefinedFunctionProvider`. >> The META-INF folder must be accessible from the root of whatever paths are >> on the classpath, otherwise it won't be picked up by ServiceLoader. >> >> >> daffodil-udf/src/main/java/org/apache/daffodil/udf/UserDefinedFunctionProvider.java >> 21: * Abstract class used by ServiceLoader to poll for UDF providers on >> classpath. >> >> >> daffodil-runtime1/src/main/scala/org/apache/daffodil/udf/UserDefinedFunctionService.scala >> 22:import java.util.ServiceLoader >> 83: val loader: ServiceLoader[UserDefinedFunctionProvider] = >> ServiceLoader.load(classOf[UserDefinedFunctionProvider]) >> 185: * We catch any errors thrown by the ServiceLoader here. This >> usually means UDFP >> interran@GH3WPL13E:~/apache/incubator-daffodil-asf$ >> >> John >> >> -----Original Message----- >> From: Wass, John L <wa...@ctc.com> >> Sent: Tuesday, September 29, 2020 10:20 AM >> To: dev@daffodil.apache.org >> Subject: EXT: Validator SPI proposal >> >> Greetings, >> >> Please consider the following proposal to extend the Daffodil Infoset >> Validation API. The proposed changes support deploying custom validation >> implementations that are not built as part of the Daffodil distribution but >> are instead made available at runtime as Java Service Provider Interface >> (SPI) [1] "plug-ins". >> >> The intent here is to enable a wide range of validation approaches >> without increasing overhead for the Daffodil project, while increasing the >> velocity at which such implementations can be deployed. >> >> To support the discussion there is a minimally functional reference >> implementation for Daffodil[2] and sample application using Schematron in a >> standalone project[3]. >> >> I look forward to discussing the approach in more detail. >> >> >> Approach >> --- >> >> 1. Extract a Validator interface that describes validation behavior. >> 2. Detect implementations of this interface at runtime using SPI. >> 3. Parse additional validation arguments from CLI 4. Pass "Custom" >> validators through the existing api.ValidationMode. >> 5. Change ParseResult to execute validation through a SPI provided >> instance. >> >> - Instances of the Validator are accessed at runtime using SPI metadata >> from META-INF. >> - The existing Validator behavior remains and is installed as the >> "default" behavior. >> - The current CLI arguments for validation would not change, but an >> extended set of parse patterns is added. >> >> >> CLI Usage >> --- >> >> In the Schematron sample application there are a few CLI patterns >> impemented for reference. >> >> The simplest usage, using the BMP schema, is >> >> `daffodil parse --schema data/bmp.dfdl.xsd --validate sch=data/bmp.sch >> data/MARBLES.BMP` >> >> Where 'sch' is the lookup name for the SPI validator and following the >> '=' is an argument which points to the schematron to use. >> >> There are other argument configurations that will need discussed. >> >> >> Unknowns >> --- >> >> - How to approach breaking changes in the Validator API >> - How to evolve serialized API objects to prevent breakage in existing >> serialized objects (specifically from daffodil.api.ValidationMode) >> - Is there a better overall approach to this :P >> >> >> >> 1. https://docs.oracle.com/javase/tutorial/ext/basics/spi.html >> 2. https://github.com/ctc-oss/incubator-daffodil >> 3. https://github.com/ctc-oss/daffodil-schematron-validator >> >> >> >> -- >> John Wass >> Software Engineer >> Concurrent Technologies Corporation >> >> >> ----------------------------------------------------------------- >> This message and any files transmitted within are intended solely for the >> addressee or its representative and may contain company proprietary >> information. If you are not the intended recipient, notify the sender >> immediately and delete this message. Publication, reproduction, >> forwarding, or content disclosure is prohibited without the consent of the >> original sender and may be unlawful. >> >> Concurrent Technologies Corporation and its Affiliates. >> www.ctc.com<http://www.ctc.com> 1-800-282-4392 >> ----------------------------------------------------------------- >> >>