Based on the feedback it sounds like the approach is sane enough to put
together a PR.

Thanks all for the reviews and feedback.


On Thu, Oct 1, 2020 at 6:11 PM Beckerle, Mike <mbecke...@owlcyberdefense.com>
wrote:

> FYI: John Wass - I am also going to surf your code a bit so may have more
> comments. I've fetched from your ctc-oss repositories.
>
> One thing the UDF code did work through is how to define a SPI-based
> feature for Daffodil and also include test-specific instances of it and
> test them, all in the daffodil source tree.
>
> ________________________________
> From: Beckerle, Mike <mbecke...@owlcyberdefense.com>
> Sent: Thursday, October 1, 2020 5:52 PM
> To: dev@daffodil.apache.org <dev@daffodil.apache.org>
> Subject: Re: Validator SPI proposal
>
> A few thoughts on top of John Interrante's review.
>
> The validator code/clases being found via SPI seems good. Sharing
> code/library with the existing usage for UDFs would be nice if it works out.
>
> The validator code reads in various "specs".
>
> For XML Schema validation with xerces, this is the XML Schema (which is
> also the DFDL schema).
>
> For Schematron validation, I know some people have asked for the ability
> to express the schematron rules on the DFDL schema as added schema
> annotation elements, positioning them on elements and having the "." path
> expression refer to the element corresonding to the element declaration
> upon which the rule is placed. They end up looking somewhat like DFDL's
> assertions, but the schematron rules use full XPath, and so can do somewhat
> more, and they are operating on the XML Infoset, not the DFDL Infoset.
>
> But regardless of whether the schematron rules are extracted from the DFDL
> schema or from another file, the schematron validator, just like xerces,
> effectively has to compile that "spec" information into an internal data
> structure that enables fast validation.
>
> So a requirement is that this happens once only, at startup time
> regardless of how many times parse/unparse are called.
>
> Ideally, one would be able to serialize the result of this compilation
> i.e., save and serialize the validator so that it needn't be recompiled at
> all if reloaded. If this compiled validator is serializable, then just
> making that value a member of the SchemaSetRuntimeData class should do it,
> as that object and all its members get serialized now.
>
> So if possible the validator API should accommodate this
> compile/save/reload cycle.
>
> btw: daffodil has validation options for parse, but not for unparse
> currently. It should have the option to validate the incoming infoset
> before unparsing as well.
>
> Re: Your "unknowns"
>
>      - How to approach breaking changes in the Validator API
>
> This is a general issue with Daffodil APIs. I think we have previously
> adopted a posture of that we would support API change by retaining existing
> but deprecated APIs for a release or two before phasing them out. We try to
> sort these out in design discussions of APIs or in Pull-Request reviews
> that have API changes in them.
>
> - How to evolve serialized API objects to prevent breakage in existing
> serialized objects (specifically from daffodil.api.ValidationMode)
>
> We have heretofore punted this in Daffodil generally. Saved
> parser/unparsers are version specific. If we want to fix this we should use
> a general approach for all Daffodil's serialized objects.  You are
> proposing to change ValidationMode so that really, it's not an enum any
> more, it can use identifiers that are pulled from classpath/SPI objects
> found.
>
> In that case the code that uses ValidationMode  will have to change to use
> something more general. Probably ValidationMode itself has to go away as a
> concept replaced by a ValidatorSpec class which can be constructed from a
> string.
>
> Then maybe ValidationMode.on isn't an enum at all any more, but a method
> that returns a singleton ValidationSpec for the xerces built in validator?
>
> - Is there a better overall approach to this  :P
>
> Gotta start somewhere.
>
> -mikeb
>
> ________________________________
> From: Wass, John L <wa...@ctc.com>
> Sent: Wednesday, September 30, 2020 3:19 PM
> To: dev@daffodil.apache.org <dev@daffodil.apache.org>
> Subject: Re: Validator SPI proposal
>
> Thanks for the review John.
>
> > Then how do you combine your forked Daffodil and sample schemetron
> implementation/application together so that your simplest usage example
> actually works?
>
> Sure.  The missing instructions are below and they were also added to the
> readme in the sample app repo.
>
> ---
>
> 1. From the root of daffodil; stage the cli package
> `sbt daffodil-cli/universal:stage`
>
> 2. From `daffodil-cli/target/universal/stage`; run the application,
> verifying it fails as expected due to missing schematron validator jar
> `./bin/daffodil parse --schema $data_dir/bmp.dfdl.xsd --validate
> sch=$data_dir/bmp.sch $data_dir/MARBLES.BMP`
> Should result in
> `[error] Bad arguments for option 'validate': 'sch=/sample/data/bmp.sch' -
> Unrecognized ValidationMode sch=/sample/data/bmp.sch.  Must be 'on',
> 'limited', 'off', or name of spi validator.`
>
> 3. From the root of schematron validator; create an assembly jar
> `sbt assembly`
>
> 4. From `daffodil-schematron-validator/target/scala-2.12`; copy the
> validator jar to the staged daffodil-cli application lib dir
> `daffodil-cli/target/universal/stage/lib`
>
> 5. From `daffodil-cli/target/universal/stage`; run the application
> `./bin/daffodil parse --schema $data_dir/bmp.dfdl.xsd --validate
> sch=$data_dir/bmp.sch $data_dir/MARBLES.BMP`
>
> 6. See the parsed BMP with schematron validation status dumped to stdout.
>
>
> Note the exported path to the schematron validator data dir as `data_dir`
> in the examples above.
>
>
> ---
>
>
> > I'd have to do some research before I could say something about your
> bullet items for discussion:
>
> Thanks. I am actively thinking about these and welcome input.  Will follow
> up with additional thoughts...
>
>
> > FYI, Daffodil already uses the ServiceLoader API to load user defined
> functions (daffodil-udf)
>
> Excellent. I did search for SPI related things when first looking at this
> but somehow missed that implementation.  I'll review it.
>
>
> Appreciate the feedback, and looking forward to hearing if the app runs
> for you :)
>
>
> john
>
>
> ________________________________
> From: Interrante, John A (GE Research, US) <inter...@research.ge.com>
> Sent: Tuesday, September 29, 2020 5:34 PM
> To: dev@daffodil.apache.org
> Subject: RE: Validator SPI proposal
>
> Hello John,
>
> Using ServiceLoader looks reasonable.  I looked at your reference
> implementation and sample application, but can you clarify a question for
> me?  First you build your forked Daffodil and your sample application
> separately in different directories.   Then how do you combine your forked
> Daffodil and sample schemetron implementation/application together so that
> your simplest usage example actually works?  That is, do you need to do
> step 1 below?
>
> 1.  Copy a jar from daffodil-schematron-validator/target/... to
> incubator-daffodil/daffodil-cli/target/universal/stage/lib?
>         $ <please fill in this step>
> 2.  Define an alias (or create a symbolic link) to allow you to run your
> freshly built daffodil executable?
>         $ alias
> daffodil="$HOME/incubator-daffodil/daffodil-cli/target/universal/stage/bin/daffodil"
> 3.  Run your simplest usage example?
>         $ cd daffodil-schematron-validator
>         $ daffodil parse --schema data/bmp.dfdl.xsd --validate
> sch=data/bmp.sch data/MARBLES.BMP
>
> I'd have to do some research before I could say something about your
> bullet items for discussion:
>
> - How to approach breaking changes in the Validator API
> - How to evolve serialized API objects to prevent breakage in existing
> serialized objects (specifically from daffodil.api.ValidationMode)
> - Is there a better overall approach to this
>
> FYI, Daffodil already uses the ServiceLoader API to load user defined
> functions (daffodil-udf).  I don't know much about the UDF files; I found
> them only because I searched for any occurrences of ServiceLoader in
> Daffodil.  I don't know if you have seen these files and whether any of
> them informed your implementation, but I'll append a list of the UDF files
> for you to look at.
>
> interran@GH3WPL13E:~/apache/incubator-daffodil-asf$ fd udf
> daffodil-cli/src/it/scala/org/apache/daffodil/udf
> daffodil-cli/src/it/scala/org/apache/daffodil/udf/TestCLIUdfs.scala
> daffodil-runtime1/src/main/scala/org/apache/daffodil/udf
>
> daffodil-test/src/test/resources/META-INF/services/org.apache.daffodil.udf.UserDefinedFunctionProvider
> daffodil-test/src/test/resources/org/apache/daffodil/udf
> daffodil-test/src/test/resources/org/apache/daffodil/udf/udfs.tdml
> daffodil-test/src/test/scala/org/apache/daffodil/udf
>
> daffodil-test/src/test/scala/org/apache/daffodil/udf/TestUdfsInSchemas.scala
> daffodil-udf
> daffodil-udf/src/main/java/org/apache/daffodil/udf
> daffodil-udf/src/test/java/org/badudfs
>
> daffodil-udf/src/test/java/org/badudfs/annotations/StringFunctions/META-INF/services/org.apache.daffodil.udf.UserDefinedFunctionProvider
>
> daffodil-udf/src/test/java/org/badudfs/evaluate/StringFunctions/META-INF/services/org.apache.daffodil.udf.UserDefinedFunctionProvider
>
> daffodil-udf/src/test/java/org/badudfs/functionclasses1/StringFunctions/META-INF/services/org.apache.daffodil.udf.UserDefinedFunctionProvider
>
> daffodil-udf/src/test/java/org/badudfs/functionclasses2/StringFunctions/META-INF/services/org.apache.daffodil.udf.UserDefinedFunctionProvider
> daffodil-udf/src/test/java/org/badudfs/nonUDF
>
> daffodil-udf/src/test/java/org/badudfs/nonUDF/StringFunctions/META-INF/services/org.apache.daffodil.udf.UserDefinedFunctionProvider
> daffodil-udf/src/test/java/org/jgoodudfs
> daffodil-udf/src/test/resources/org/apache/daffodil/udf
>
> daffodil-udf/src/test/resources/org/apache/daffodil/udf/genericUdfSchema.xsd
>
> daffodil-udf/src/test/resources/org/badmetainf/nonexistentclass/META-INF/services/org.apache.daffodil.udf.UserDefinedFunctionProvider
>
> daffodil-udf/src/test/resources/org/goodmetainf/IntegerFunctions/META-INF/services/org.apache.daffodil.udf.UserDefinedFunctionProvider
>
> daffodil-udf/src/test/resources/org/goodmetainf/StringFunctions/META-INF/services/org.apache.daffodil.udf.UserDefinedFunctionProvider
> daffodil-udf/src/test/scala/org/sbadudfs
>
> daffodil-udf/src/test/scala/org/sbadudfs/functionclasses/StringFunctions/META-INF/services/org.apache.daffodil.udf.UserDefinedFunctionProvider
>
> daffodil-udf/src/test/scala/org/sbadudfs/functionclasses2/StringFunctions/META-INF/services/org.apache.daffodil.udf.UserDefinedFunctionProvider
> daffodil-udf/src/test/scala/org/sbadudfs/udfexceptions
>
> daffodil-udf/src/test/scala/org/sbadudfs/udfexceptions/evaluating/StringFunctions/META-INF/services/org.apache.daffodil.udf.UserDefinedFunctionProvider
> daffodil-udf/src/test/scala/org/sbadudfs/udfexceptions2
>
> daffodil-udf/src/test/scala/org/sbadudfs/udfexceptions2/StringFunctions/META-INF/services/org.apache.daffodil.udf.UserDefinedFunctionProvider
> daffodil-udf/src/test/scala/org/sbadudfs/udfpexceptions
>
> daffodil-udf/src/test/scala/org/sbadudfs/udfpexceptions/StringFunctions/META-INF/services/org.apache.daffodil.udf.UserDefinedFunctionProvider
> daffodil-udf/src/test/scala/org/sbadudfs/udfpexceptions2
>
> daffodil-udf/src/test/scala/org/sbadudfs/udfpexceptions2/StringFunctions/META-INF/services/org.apache.daffodil.udf.UserDefinedFunctionProvider
> daffodil-udf/src/test/scala/org/sgoodudfs
> interran@GH3WPL13E:~/apache/incubator-daffodil-asf$ rg ServiceLoader
> daffodil-udf/README.md
> 36:This class will act as a traditional service provider as explained in
> the ServiceLoader API, and must have a
> *META-INF/services/org.apache.daffodil.udf.UserDefinedFunctionProvider*
> file in its project. This file must contain the fully qualified name(s) of
> the **provider class(es)** in the JAR. Without that file, neither this
> class nor any of the User Defined Function classes it provides will be
> visible to Daffodil.
> 86:Each UDF is registered by including the fully qualified name of its
> provider in a text file named
> `META-INF/services/org.apache.daffodil.udf.UserDefinedFunctionProvider`.
> The META-INF folder must be accessible from the root of whatever paths are
> on the classpath, otherwise it won't be picked up by ServiceLoader.
>
>
> daffodil-udf/src/main/java/org/apache/daffodil/udf/UserDefinedFunctionProvider.java
> 21: * Abstract class used by ServiceLoader to poll for UDF providers on
> classpath.
>
>
> daffodil-runtime1/src/main/scala/org/apache/daffodil/udf/UserDefinedFunctionService.scala
> 22:import java.util.ServiceLoader
> 83:    val loader: ServiceLoader[UserDefinedFunctionProvider] =
> ServiceLoader.load(classOf[UserDefinedFunctionProvider])
> 185:     * We catch any errors thrown by the ServiceLoader here. This
> usually means UDFP
> interran@GH3WPL13E:~/apache/incubator-daffodil-asf$
>
> John
>
> -----Original Message-----
> From: Wass, John L <wa...@ctc.com>
> Sent: Tuesday, September 29, 2020 10:20 AM
> To: dev@daffodil.apache.org
> Subject: EXT: Validator SPI proposal
>
> Greetings,
>
> Please consider the following proposal to extend the Daffodil Infoset
> Validation API.  The proposed changes support deploying custom validation
> implementations that are not built as part of the Daffodil distribution but
> are instead made available at runtime as Java Service Provider Interface
> (SPI) [1] "plug-ins".
>
> The intent here is to enable a wide range of validation approaches without
> increasing overhead for the Daffodil project, while increasing the velocity
> at which such implementations can be deployed.
>
> To support the discussion there is a minimally functional reference
> implementation for Daffodil[2] and sample application using Schematron in a
> standalone project[3].
>
> I look forward to discussing the approach in more detail.
>
>
> Approach
> ---
>
> 1. Extract a Validator interface that describes validation behavior.
> 2. Detect implementations of this interface at runtime using SPI.
> 3. Parse additional validation arguments from CLI 4. Pass "Custom"
> validators through the existing api.ValidationMode.
> 5. Change ParseResult to execute validation through a SPI provided
> instance.
>
> - Instances of the Validator are accessed at runtime using SPI metadata
> from META-INF.
> - The existing Validator behavior remains and is installed as the
> "default" behavior.
> - The current CLI arguments for validation would not change, but an
> extended set of parse patterns is added.
>
>
> CLI Usage
> ---
>
> In the Schematron sample application there are a few CLI patterns
> impemented for reference.
>
> The simplest usage, using the BMP schema, is
>
> `daffodil parse --schema data/bmp.dfdl.xsd --validate sch=data/bmp.sch
> data/MARBLES.BMP`
>
> Where 'sch' is the lookup name for the SPI validator and following the '='
> is an argument which points to the schematron to use.
>
> There are other argument configurations that will need discussed.
>
>
> Unknowns
> ---
>
> - How to approach breaking changes in the Validator API
> - How to evolve serialized API objects to prevent breakage in existing
> serialized objects (specifically from daffodil.api.ValidationMode)
> - Is there a better overall approach to this  :P
>
>
>
> 1. https://docs.oracle.com/javase/tutorial/ext/basics/spi.html
> 2. https://github.com/ctc-oss/incubator-daffodil
> 3. https://github.com/ctc-oss/daffodil-schematron-validator
>
>
>
> --
> John Wass
> Software Engineer
> Concurrent Technologies Corporation
>
>
> -----------------------------------------------------------------
> This message and any files transmitted within are intended solely for the
> addressee or its representative and may contain company proprietary
> information.  If you are not the intended recipient, notify the sender
> immediately and delete this message.  Publication, reproduction,
> forwarding, or content disclosure is prohibited without the consent of the
> original sender and may be unlawful.
>
> Concurrent Technologies Corporation and its Affiliates.
> www.ctc.com<http://www.ctc.com>  1-800-282-4392
> -----------------------------------------------------------------
>
>

Reply via email to