Based on the feedback it sounds like the approach is sane enough to put together a PR.
Thanks all for the reviews and feedback. On Thu, Oct 1, 2020 at 6:11 PM Beckerle, Mike <mbecke...@owlcyberdefense.com> wrote: > FYI: John Wass - I am also going to surf your code a bit so may have more > comments. I've fetched from your ctc-oss repositories. > > One thing the UDF code did work through is how to define a SPI-based > feature for Daffodil and also include test-specific instances of it and > test them, all in the daffodil source tree. > > ________________________________ > From: Beckerle, Mike <mbecke...@owlcyberdefense.com> > Sent: Thursday, October 1, 2020 5:52 PM > To: dev@daffodil.apache.org <dev@daffodil.apache.org> > Subject: Re: Validator SPI proposal > > A few thoughts on top of John Interrante's review. > > The validator code/clases being found via SPI seems good. Sharing > code/library with the existing usage for UDFs would be nice if it works out. > > The validator code reads in various "specs". > > For XML Schema validation with xerces, this is the XML Schema (which is > also the DFDL schema). > > For Schematron validation, I know some people have asked for the ability > to express the schematron rules on the DFDL schema as added schema > annotation elements, positioning them on elements and having the "." path > expression refer to the element corresonding to the element declaration > upon which the rule is placed. They end up looking somewhat like DFDL's > assertions, but the schematron rules use full XPath, and so can do somewhat > more, and they are operating on the XML Infoset, not the DFDL Infoset. > > But regardless of whether the schematron rules are extracted from the DFDL > schema or from another file, the schematron validator, just like xerces, > effectively has to compile that "spec" information into an internal data > structure that enables fast validation. > > So a requirement is that this happens once only, at startup time > regardless of how many times parse/unparse are called. > > Ideally, one would be able to serialize the result of this compilation > i.e., save and serialize the validator so that it needn't be recompiled at > all if reloaded. If this compiled validator is serializable, then just > making that value a member of the SchemaSetRuntimeData class should do it, > as that object and all its members get serialized now. > > So if possible the validator API should accommodate this > compile/save/reload cycle. > > btw: daffodil has validation options for parse, but not for unparse > currently. It should have the option to validate the incoming infoset > before unparsing as well. > > Re: Your "unknowns" > > - How to approach breaking changes in the Validator API > > This is a general issue with Daffodil APIs. I think we have previously > adopted a posture of that we would support API change by retaining existing > but deprecated APIs for a release or two before phasing them out. We try to > sort these out in design discussions of APIs or in Pull-Request reviews > that have API changes in them. > > - How to evolve serialized API objects to prevent breakage in existing > serialized objects (specifically from daffodil.api.ValidationMode) > > We have heretofore punted this in Daffodil generally. Saved > parser/unparsers are version specific. If we want to fix this we should use > a general approach for all Daffodil's serialized objects. You are > proposing to change ValidationMode so that really, it's not an enum any > more, it can use identifiers that are pulled from classpath/SPI objects > found. > > In that case the code that uses ValidationMode will have to change to use > something more general. Probably ValidationMode itself has to go away as a > concept replaced by a ValidatorSpec class which can be constructed from a > string. > > Then maybe ValidationMode.on isn't an enum at all any more, but a method > that returns a singleton ValidationSpec for the xerces built in validator? > > - Is there a better overall approach to this :P > > Gotta start somewhere. > > -mikeb > > ________________________________ > From: Wass, John L <wa...@ctc.com> > Sent: Wednesday, September 30, 2020 3:19 PM > To: dev@daffodil.apache.org <dev@daffodil.apache.org> > Subject: Re: Validator SPI proposal > > Thanks for the review John. > > > Then how do you combine your forked Daffodil and sample schemetron > implementation/application together so that your simplest usage example > actually works? > > Sure. The missing instructions are below and they were also added to the > readme in the sample app repo. > > --- > > 1. From the root of daffodil; stage the cli package > `sbt daffodil-cli/universal:stage` > > 2. From `daffodil-cli/target/universal/stage`; run the application, > verifying it fails as expected due to missing schematron validator jar > `./bin/daffodil parse --schema $data_dir/bmp.dfdl.xsd --validate > sch=$data_dir/bmp.sch $data_dir/MARBLES.BMP` > Should result in > `[error] Bad arguments for option 'validate': 'sch=/sample/data/bmp.sch' - > Unrecognized ValidationMode sch=/sample/data/bmp.sch. Must be 'on', > 'limited', 'off', or name of spi validator.` > > 3. From the root of schematron validator; create an assembly jar > `sbt assembly` > > 4. From `daffodil-schematron-validator/target/scala-2.12`; copy the > validator jar to the staged daffodil-cli application lib dir > `daffodil-cli/target/universal/stage/lib` > > 5. From `daffodil-cli/target/universal/stage`; run the application > `./bin/daffodil parse --schema $data_dir/bmp.dfdl.xsd --validate > sch=$data_dir/bmp.sch $data_dir/MARBLES.BMP` > > 6. See the parsed BMP with schematron validation status dumped to stdout. > > > Note the exported path to the schematron validator data dir as `data_dir` > in the examples above. > > > --- > > > > I'd have to do some research before I could say something about your > bullet items for discussion: > > Thanks. I am actively thinking about these and welcome input. Will follow > up with additional thoughts... > > > > FYI, Daffodil already uses the ServiceLoader API to load user defined > functions (daffodil-udf) > > Excellent. I did search for SPI related things when first looking at this > but somehow missed that implementation. I'll review it. > > > Appreciate the feedback, and looking forward to hearing if the app runs > for you :) > > > john > > > ________________________________ > From: Interrante, John A (GE Research, US) <inter...@research.ge.com> > Sent: Tuesday, September 29, 2020 5:34 PM > To: dev@daffodil.apache.org > Subject: RE: Validator SPI proposal > > Hello John, > > Using ServiceLoader looks reasonable. I looked at your reference > implementation and sample application, but can you clarify a question for > me? First you build your forked Daffodil and your sample application > separately in different directories. Then how do you combine your forked > Daffodil and sample schemetron implementation/application together so that > your simplest usage example actually works? That is, do you need to do > step 1 below? > > 1. Copy a jar from daffodil-schematron-validator/target/... to > incubator-daffodil/daffodil-cli/target/universal/stage/lib? > $ <please fill in this step> > 2. Define an alias (or create a symbolic link) to allow you to run your > freshly built daffodil executable? > $ alias > daffodil="$HOME/incubator-daffodil/daffodil-cli/target/universal/stage/bin/daffodil" > 3. Run your simplest usage example? > $ cd daffodil-schematron-validator > $ daffodil parse --schema data/bmp.dfdl.xsd --validate > sch=data/bmp.sch data/MARBLES.BMP > > I'd have to do some research before I could say something about your > bullet items for discussion: > > - How to approach breaking changes in the Validator API > - How to evolve serialized API objects to prevent breakage in existing > serialized objects (specifically from daffodil.api.ValidationMode) > - Is there a better overall approach to this > > FYI, Daffodil already uses the ServiceLoader API to load user defined > functions (daffodil-udf). I don't know much about the UDF files; I found > them only because I searched for any occurrences of ServiceLoader in > Daffodil. I don't know if you have seen these files and whether any of > them informed your implementation, but I'll append a list of the UDF files > for you to look at. > > interran@GH3WPL13E:~/apache/incubator-daffodil-asf$ fd udf > daffodil-cli/src/it/scala/org/apache/daffodil/udf > daffodil-cli/src/it/scala/org/apache/daffodil/udf/TestCLIUdfs.scala > daffodil-runtime1/src/main/scala/org/apache/daffodil/udf > > daffodil-test/src/test/resources/META-INF/services/org.apache.daffodil.udf.UserDefinedFunctionProvider > daffodil-test/src/test/resources/org/apache/daffodil/udf > daffodil-test/src/test/resources/org/apache/daffodil/udf/udfs.tdml > daffodil-test/src/test/scala/org/apache/daffodil/udf > > daffodil-test/src/test/scala/org/apache/daffodil/udf/TestUdfsInSchemas.scala > daffodil-udf > daffodil-udf/src/main/java/org/apache/daffodil/udf > daffodil-udf/src/test/java/org/badudfs > > daffodil-udf/src/test/java/org/badudfs/annotations/StringFunctions/META-INF/services/org.apache.daffodil.udf.UserDefinedFunctionProvider > > daffodil-udf/src/test/java/org/badudfs/evaluate/StringFunctions/META-INF/services/org.apache.daffodil.udf.UserDefinedFunctionProvider > > daffodil-udf/src/test/java/org/badudfs/functionclasses1/StringFunctions/META-INF/services/org.apache.daffodil.udf.UserDefinedFunctionProvider > > daffodil-udf/src/test/java/org/badudfs/functionclasses2/StringFunctions/META-INF/services/org.apache.daffodil.udf.UserDefinedFunctionProvider > daffodil-udf/src/test/java/org/badudfs/nonUDF > > daffodil-udf/src/test/java/org/badudfs/nonUDF/StringFunctions/META-INF/services/org.apache.daffodil.udf.UserDefinedFunctionProvider > daffodil-udf/src/test/java/org/jgoodudfs > daffodil-udf/src/test/resources/org/apache/daffodil/udf > > daffodil-udf/src/test/resources/org/apache/daffodil/udf/genericUdfSchema.xsd > > daffodil-udf/src/test/resources/org/badmetainf/nonexistentclass/META-INF/services/org.apache.daffodil.udf.UserDefinedFunctionProvider > > daffodil-udf/src/test/resources/org/goodmetainf/IntegerFunctions/META-INF/services/org.apache.daffodil.udf.UserDefinedFunctionProvider > > daffodil-udf/src/test/resources/org/goodmetainf/StringFunctions/META-INF/services/org.apache.daffodil.udf.UserDefinedFunctionProvider > daffodil-udf/src/test/scala/org/sbadudfs > > daffodil-udf/src/test/scala/org/sbadudfs/functionclasses/StringFunctions/META-INF/services/org.apache.daffodil.udf.UserDefinedFunctionProvider > > daffodil-udf/src/test/scala/org/sbadudfs/functionclasses2/StringFunctions/META-INF/services/org.apache.daffodil.udf.UserDefinedFunctionProvider > daffodil-udf/src/test/scala/org/sbadudfs/udfexceptions > > daffodil-udf/src/test/scala/org/sbadudfs/udfexceptions/evaluating/StringFunctions/META-INF/services/org.apache.daffodil.udf.UserDefinedFunctionProvider > daffodil-udf/src/test/scala/org/sbadudfs/udfexceptions2 > > daffodil-udf/src/test/scala/org/sbadudfs/udfexceptions2/StringFunctions/META-INF/services/org.apache.daffodil.udf.UserDefinedFunctionProvider > daffodil-udf/src/test/scala/org/sbadudfs/udfpexceptions > > daffodil-udf/src/test/scala/org/sbadudfs/udfpexceptions/StringFunctions/META-INF/services/org.apache.daffodil.udf.UserDefinedFunctionProvider > daffodil-udf/src/test/scala/org/sbadudfs/udfpexceptions2 > > daffodil-udf/src/test/scala/org/sbadudfs/udfpexceptions2/StringFunctions/META-INF/services/org.apache.daffodil.udf.UserDefinedFunctionProvider > daffodil-udf/src/test/scala/org/sgoodudfs > interran@GH3WPL13E:~/apache/incubator-daffodil-asf$ rg ServiceLoader > daffodil-udf/README.md > 36:This class will act as a traditional service provider as explained in > the ServiceLoader API, and must have a > *META-INF/services/org.apache.daffodil.udf.UserDefinedFunctionProvider* > file in its project. This file must contain the fully qualified name(s) of > the **provider class(es)** in the JAR. Without that file, neither this > class nor any of the User Defined Function classes it provides will be > visible to Daffodil. > 86:Each UDF is registered by including the fully qualified name of its > provider in a text file named > `META-INF/services/org.apache.daffodil.udf.UserDefinedFunctionProvider`. > The META-INF folder must be accessible from the root of whatever paths are > on the classpath, otherwise it won't be picked up by ServiceLoader. > > > daffodil-udf/src/main/java/org/apache/daffodil/udf/UserDefinedFunctionProvider.java > 21: * Abstract class used by ServiceLoader to poll for UDF providers on > classpath. > > > daffodil-runtime1/src/main/scala/org/apache/daffodil/udf/UserDefinedFunctionService.scala > 22:import java.util.ServiceLoader > 83: val loader: ServiceLoader[UserDefinedFunctionProvider] = > ServiceLoader.load(classOf[UserDefinedFunctionProvider]) > 185: * We catch any errors thrown by the ServiceLoader here. This > usually means UDFP > interran@GH3WPL13E:~/apache/incubator-daffodil-asf$ > > John > > -----Original Message----- > From: Wass, John L <wa...@ctc.com> > Sent: Tuesday, September 29, 2020 10:20 AM > To: dev@daffodil.apache.org > Subject: EXT: Validator SPI proposal > > Greetings, > > Please consider the following proposal to extend the Daffodil Infoset > Validation API. The proposed changes support deploying custom validation > implementations that are not built as part of the Daffodil distribution but > are instead made available at runtime as Java Service Provider Interface > (SPI) [1] "plug-ins". > > The intent here is to enable a wide range of validation approaches without > increasing overhead for the Daffodil project, while increasing the velocity > at which such implementations can be deployed. > > To support the discussion there is a minimally functional reference > implementation for Daffodil[2] and sample application using Schematron in a > standalone project[3]. > > I look forward to discussing the approach in more detail. > > > Approach > --- > > 1. Extract a Validator interface that describes validation behavior. > 2. Detect implementations of this interface at runtime using SPI. > 3. Parse additional validation arguments from CLI 4. Pass "Custom" > validators through the existing api.ValidationMode. > 5. Change ParseResult to execute validation through a SPI provided > instance. > > - Instances of the Validator are accessed at runtime using SPI metadata > from META-INF. > - The existing Validator behavior remains and is installed as the > "default" behavior. > - The current CLI arguments for validation would not change, but an > extended set of parse patterns is added. > > > CLI Usage > --- > > In the Schematron sample application there are a few CLI patterns > impemented for reference. > > The simplest usage, using the BMP schema, is > > `daffodil parse --schema data/bmp.dfdl.xsd --validate sch=data/bmp.sch > data/MARBLES.BMP` > > Where 'sch' is the lookup name for the SPI validator and following the '=' > is an argument which points to the schematron to use. > > There are other argument configurations that will need discussed. > > > Unknowns > --- > > - How to approach breaking changes in the Validator API > - How to evolve serialized API objects to prevent breakage in existing > serialized objects (specifically from daffodil.api.ValidationMode) > - Is there a better overall approach to this :P > > > > 1. https://docs.oracle.com/javase/tutorial/ext/basics/spi.html > 2. https://github.com/ctc-oss/incubator-daffodil > 3. https://github.com/ctc-oss/daffodil-schematron-validator > > > > -- > John Wass > Software Engineer > Concurrent Technologies Corporation > > > ----------------------------------------------------------------- > This message and any files transmitted within are intended solely for the > addressee or its representative and may contain company proprietary > information. If you are not the intended recipient, notify the sender > immediately and delete this message. Publication, reproduction, > forwarding, or content disclosure is prohibited without the consent of the > original sender and may be unlawful. > > Concurrent Technologies Corporation and its Affiliates. > www.ctc.com<http://www.ctc.com> 1-800-282-4392 > ----------------------------------------------------------------- > >