PR is in at https://github.com/apache/incubator-daffodil/pull/431



On Wed, Oct 7, 2020 at 9:43 AM john wass <jwa...@gmail.com> wrote:

> Based on the feedback it sounds like the approach is sane enough to put
> together a PR.
>
> Thanks all for the reviews and feedback.
>
>
> On Thu, Oct 1, 2020 at 6:11 PM Beckerle, Mike <
> mbecke...@owlcyberdefense.com> wrote:
>
>> FYI: John Wass - I am also going to surf your code a bit so may have more
>> comments. I've fetched from your ctc-oss repositories.
>>
>> One thing the UDF code did work through is how to define a SPI-based
>> feature for Daffodil and also include test-specific instances of it and
>> test them, all in the daffodil source tree.
>>
>> ________________________________
>> From: Beckerle, Mike <mbecke...@owlcyberdefense.com>
>> Sent: Thursday, October 1, 2020 5:52 PM
>> To: dev@daffodil.apache.org <dev@daffodil.apache.org>
>> Subject: Re: Validator SPI proposal
>>
>> A few thoughts on top of John Interrante's review.
>>
>> The validator code/clases being found via SPI seems good. Sharing
>> code/library with the existing usage for UDFs would be nice if it works out.
>>
>> The validator code reads in various "specs".
>>
>> For XML Schema validation with xerces, this is the XML Schema (which is
>> also the DFDL schema).
>>
>> For Schematron validation, I know some people have asked for the ability
>> to express the schematron rules on the DFDL schema as added schema
>> annotation elements, positioning them on elements and having the "." path
>> expression refer to the element corresonding to the element declaration
>> upon which the rule is placed. They end up looking somewhat like DFDL's
>> assertions, but the schematron rules use full XPath, and so can do somewhat
>> more, and they are operating on the XML Infoset, not the DFDL Infoset.
>>
>> But regardless of whether the schematron rules are extracted from the
>> DFDL schema or from another file, the schematron validator, just like
>> xerces, effectively has to compile that "spec" information into an internal
>> data structure that enables fast validation.
>>
>> So a requirement is that this happens once only, at startup time
>> regardless of how many times parse/unparse are called.
>>
>> Ideally, one would be able to serialize the result of this compilation
>> i.e., save and serialize the validator so that it needn't be recompiled at
>> all if reloaded. If this compiled validator is serializable, then just
>> making that value a member of the SchemaSetRuntimeData class should do it,
>> as that object and all its members get serialized now.
>>
>> So if possible the validator API should accommodate this
>> compile/save/reload cycle.
>>
>> btw: daffodil has validation options for parse, but not for unparse
>> currently. It should have the option to validate the incoming infoset
>> before unparsing as well.
>>
>> Re: Your "unknowns"
>>
>>      - How to approach breaking changes in the Validator API
>>
>> This is a general issue with Daffodil APIs. I think we have previously
>> adopted a posture of that we would support API change by retaining existing
>> but deprecated APIs for a release or two before phasing them out. We try to
>> sort these out in design discussions of APIs or in Pull-Request reviews
>> that have API changes in them.
>>
>> - How to evolve serialized API objects to prevent breakage in existing
>> serialized objects (specifically from daffodil.api.ValidationMode)
>>
>> We have heretofore punted this in Daffodil generally. Saved
>> parser/unparsers are version specific. If we want to fix this we should use
>> a general approach for all Daffodil's serialized objects.  You are
>> proposing to change ValidationMode so that really, it's not an enum any
>> more, it can use identifiers that are pulled from classpath/SPI objects
>> found.
>>
>> In that case the code that uses ValidationMode  will have to change to
>> use something more general. Probably ValidationMode itself has to go away
>> as a concept replaced by a ValidatorSpec class which can be constructed
>> from a string.
>>
>> Then maybe ValidationMode.on isn't an enum at all any more, but a method
>> that returns a singleton ValidationSpec for the xerces built in validator?
>>
>> - Is there a better overall approach to this  :P
>>
>> Gotta start somewhere.
>>
>> -mikeb
>>
>> ________________________________
>> From: Wass, John L <wa...@ctc.com>
>> Sent: Wednesday, September 30, 2020 3:19 PM
>> To: dev@daffodil.apache.org <dev@daffodil.apache.org>
>> Subject: Re: Validator SPI proposal
>>
>> Thanks for the review John.
>>
>> > Then how do you combine your forked Daffodil and sample schemetron
>> implementation/application together so that your simplest usage example
>> actually works?
>>
>> Sure.  The missing instructions are below and they were also added to the
>> readme in the sample app repo.
>>
>> ---
>>
>> 1. From the root of daffodil; stage the cli package
>> `sbt daffodil-cli/universal:stage`
>>
>> 2. From `daffodil-cli/target/universal/stage`; run the application,
>> verifying it fails as expected due to missing schematron validator jar
>> `./bin/daffodil parse --schema $data_dir/bmp.dfdl.xsd --validate
>> sch=$data_dir/bmp.sch $data_dir/MARBLES.BMP`
>> Should result in
>> `[error] Bad arguments for option 'validate': 'sch=/sample/data/bmp.sch'
>> - Unrecognized ValidationMode sch=/sample/data/bmp.sch.  Must be 'on',
>> 'limited', 'off', or name of spi validator.`
>>
>> 3. From the root of schematron validator; create an assembly jar
>> `sbt assembly`
>>
>> 4. From `daffodil-schematron-validator/target/scala-2.12`; copy the
>> validator jar to the staged daffodil-cli application lib dir
>> `daffodil-cli/target/universal/stage/lib`
>>
>> 5. From `daffodil-cli/target/universal/stage`; run the application
>> `./bin/daffodil parse --schema $data_dir/bmp.dfdl.xsd --validate
>> sch=$data_dir/bmp.sch $data_dir/MARBLES.BMP`
>>
>> 6. See the parsed BMP with schematron validation status dumped to stdout.
>>
>>
>> Note the exported path to the schematron validator data dir as `data_dir`
>> in the examples above.
>>
>>
>> ---
>>
>>
>> > I'd have to do some research before I could say something about your
>> bullet items for discussion:
>>
>> Thanks. I am actively thinking about these and welcome input.  Will
>> follow up with additional thoughts...
>>
>>
>> > FYI, Daffodil already uses the ServiceLoader API to load user defined
>> functions (daffodil-udf)
>>
>> Excellent. I did search for SPI related things when first looking at this
>> but somehow missed that implementation.  I'll review it.
>>
>>
>> Appreciate the feedback, and looking forward to hearing if the app runs
>> for you :)
>>
>>
>> john
>>
>>
>> ________________________________
>> From: Interrante, John A (GE Research, US) <inter...@research.ge.com>
>> Sent: Tuesday, September 29, 2020 5:34 PM
>> To: dev@daffodil.apache.org
>> Subject: RE: Validator SPI proposal
>>
>> Hello John,
>>
>> Using ServiceLoader looks reasonable.  I looked at your reference
>> implementation and sample application, but can you clarify a question for
>> me?  First you build your forked Daffodil and your sample application
>> separately in different directories.   Then how do you combine your forked
>> Daffodil and sample schemetron implementation/application together so that
>> your simplest usage example actually works?  That is, do you need to do
>> step 1 below?
>>
>> 1.  Copy a jar from daffodil-schematron-validator/target/... to
>> incubator-daffodil/daffodil-cli/target/universal/stage/lib?
>>         $ <please fill in this step>
>> 2.  Define an alias (or create a symbolic link) to allow you to run your
>> freshly built daffodil executable?
>>         $ alias
>> daffodil="$HOME/incubator-daffodil/daffodil-cli/target/universal/stage/bin/daffodil"
>> 3.  Run your simplest usage example?
>>         $ cd daffodil-schematron-validator
>>         $ daffodil parse --schema data/bmp.dfdl.xsd --validate
>> sch=data/bmp.sch data/MARBLES.BMP
>>
>> I'd have to do some research before I could say something about your
>> bullet items for discussion:
>>
>> - How to approach breaking changes in the Validator API
>> - How to evolve serialized API objects to prevent breakage in existing
>> serialized objects (specifically from daffodil.api.ValidationMode)
>> - Is there a better overall approach to this
>>
>> FYI, Daffodil already uses the ServiceLoader API to load user defined
>> functions (daffodil-udf).  I don't know much about the UDF files; I found
>> them only because I searched for any occurrences of ServiceLoader in
>> Daffodil.  I don't know if you have seen these files and whether any of
>> them informed your implementation, but I'll append a list of the UDF files
>> for you to look at.
>>
>> interran@GH3WPL13E:~/apache/incubator-daffodil-asf$ fd udf
>> daffodil-cli/src/it/scala/org/apache/daffodil/udf
>> daffodil-cli/src/it/scala/org/apache/daffodil/udf/TestCLIUdfs.scala
>> daffodil-runtime1/src/main/scala/org/apache/daffodil/udf
>>
>> daffodil-test/src/test/resources/META-INF/services/org.apache.daffodil.udf.UserDefinedFunctionProvider
>> daffodil-test/src/test/resources/org/apache/daffodil/udf
>> daffodil-test/src/test/resources/org/apache/daffodil/udf/udfs.tdml
>> daffodil-test/src/test/scala/org/apache/daffodil/udf
>>
>> daffodil-test/src/test/scala/org/apache/daffodil/udf/TestUdfsInSchemas.scala
>> daffodil-udf
>> daffodil-udf/src/main/java/org/apache/daffodil/udf
>> daffodil-udf/src/test/java/org/badudfs
>>
>> daffodil-udf/src/test/java/org/badudfs/annotations/StringFunctions/META-INF/services/org.apache.daffodil.udf.UserDefinedFunctionProvider
>>
>> daffodil-udf/src/test/java/org/badudfs/evaluate/StringFunctions/META-INF/services/org.apache.daffodil.udf.UserDefinedFunctionProvider
>>
>> daffodil-udf/src/test/java/org/badudfs/functionclasses1/StringFunctions/META-INF/services/org.apache.daffodil.udf.UserDefinedFunctionProvider
>>
>> daffodil-udf/src/test/java/org/badudfs/functionclasses2/StringFunctions/META-INF/services/org.apache.daffodil.udf.UserDefinedFunctionProvider
>> daffodil-udf/src/test/java/org/badudfs/nonUDF
>>
>> daffodil-udf/src/test/java/org/badudfs/nonUDF/StringFunctions/META-INF/services/org.apache.daffodil.udf.UserDefinedFunctionProvider
>> daffodil-udf/src/test/java/org/jgoodudfs
>> daffodil-udf/src/test/resources/org/apache/daffodil/udf
>>
>> daffodil-udf/src/test/resources/org/apache/daffodil/udf/genericUdfSchema.xsd
>>
>> daffodil-udf/src/test/resources/org/badmetainf/nonexistentclass/META-INF/services/org.apache.daffodil.udf.UserDefinedFunctionProvider
>>
>> daffodil-udf/src/test/resources/org/goodmetainf/IntegerFunctions/META-INF/services/org.apache.daffodil.udf.UserDefinedFunctionProvider
>>
>> daffodil-udf/src/test/resources/org/goodmetainf/StringFunctions/META-INF/services/org.apache.daffodil.udf.UserDefinedFunctionProvider
>> daffodil-udf/src/test/scala/org/sbadudfs
>>
>> daffodil-udf/src/test/scala/org/sbadudfs/functionclasses/StringFunctions/META-INF/services/org.apache.daffodil.udf.UserDefinedFunctionProvider
>>
>> daffodil-udf/src/test/scala/org/sbadudfs/functionclasses2/StringFunctions/META-INF/services/org.apache.daffodil.udf.UserDefinedFunctionProvider
>> daffodil-udf/src/test/scala/org/sbadudfs/udfexceptions
>>
>> daffodil-udf/src/test/scala/org/sbadudfs/udfexceptions/evaluating/StringFunctions/META-INF/services/org.apache.daffodil.udf.UserDefinedFunctionProvider
>> daffodil-udf/src/test/scala/org/sbadudfs/udfexceptions2
>>
>> daffodil-udf/src/test/scala/org/sbadudfs/udfexceptions2/StringFunctions/META-INF/services/org.apache.daffodil.udf.UserDefinedFunctionProvider
>> daffodil-udf/src/test/scala/org/sbadudfs/udfpexceptions
>>
>> daffodil-udf/src/test/scala/org/sbadudfs/udfpexceptions/StringFunctions/META-INF/services/org.apache.daffodil.udf.UserDefinedFunctionProvider
>> daffodil-udf/src/test/scala/org/sbadudfs/udfpexceptions2
>>
>> daffodil-udf/src/test/scala/org/sbadudfs/udfpexceptions2/StringFunctions/META-INF/services/org.apache.daffodil.udf.UserDefinedFunctionProvider
>> daffodil-udf/src/test/scala/org/sgoodudfs
>> interran@GH3WPL13E:~/apache/incubator-daffodil-asf$ rg ServiceLoader
>> daffodil-udf/README.md
>> 36:This class will act as a traditional service provider as explained in
>> the ServiceLoader API, and must have a
>> *META-INF/services/org.apache.daffodil.udf.UserDefinedFunctionProvider*
>> file in its project. This file must contain the fully qualified name(s) of
>> the **provider class(es)** in the JAR. Without that file, neither this
>> class nor any of the User Defined Function classes it provides will be
>> visible to Daffodil.
>> 86:Each UDF is registered by including the fully qualified name of its
>> provider in a text file named
>> `META-INF/services/org.apache.daffodil.udf.UserDefinedFunctionProvider`.
>> The META-INF folder must be accessible from the root of whatever paths are
>> on the classpath, otherwise it won't be picked up by ServiceLoader.
>>
>>
>> daffodil-udf/src/main/java/org/apache/daffodil/udf/UserDefinedFunctionProvider.java
>> 21: * Abstract class used by ServiceLoader to poll for UDF providers on
>> classpath.
>>
>>
>> daffodil-runtime1/src/main/scala/org/apache/daffodil/udf/UserDefinedFunctionService.scala
>> 22:import java.util.ServiceLoader
>> 83:    val loader: ServiceLoader[UserDefinedFunctionProvider] =
>> ServiceLoader.load(classOf[UserDefinedFunctionProvider])
>> 185:     * We catch any errors thrown by the ServiceLoader here. This
>> usually means UDFP
>> interran@GH3WPL13E:~/apache/incubator-daffodil-asf$
>>
>> John
>>
>> -----Original Message-----
>> From: Wass, John L <wa...@ctc.com>
>> Sent: Tuesday, September 29, 2020 10:20 AM
>> To: dev@daffodil.apache.org
>> Subject: EXT: Validator SPI proposal
>>
>> Greetings,
>>
>> Please consider the following proposal to extend the Daffodil Infoset
>> Validation API.  The proposed changes support deploying custom validation
>> implementations that are not built as part of the Daffodil distribution but
>> are instead made available at runtime as Java Service Provider Interface
>> (SPI) [1] "plug-ins".
>>
>> The intent here is to enable a wide range of validation approaches
>> without increasing overhead for the Daffodil project, while increasing the
>> velocity at which such implementations can be deployed.
>>
>> To support the discussion there is a minimally functional reference
>> implementation for Daffodil[2] and sample application using Schematron in a
>> standalone project[3].
>>
>> I look forward to discussing the approach in more detail.
>>
>>
>> Approach
>> ---
>>
>> 1. Extract a Validator interface that describes validation behavior.
>> 2. Detect implementations of this interface at runtime using SPI.
>> 3. Parse additional validation arguments from CLI 4. Pass "Custom"
>> validators through the existing api.ValidationMode.
>> 5. Change ParseResult to execute validation through a SPI provided
>> instance.
>>
>> - Instances of the Validator are accessed at runtime using SPI metadata
>> from META-INF.
>> - The existing Validator behavior remains and is installed as the
>> "default" behavior.
>> - The current CLI arguments for validation would not change, but an
>> extended set of parse patterns is added.
>>
>>
>> CLI Usage
>> ---
>>
>> In the Schematron sample application there are a few CLI patterns
>> impemented for reference.
>>
>> The simplest usage, using the BMP schema, is
>>
>> `daffodil parse --schema data/bmp.dfdl.xsd --validate sch=data/bmp.sch
>> data/MARBLES.BMP`
>>
>> Where 'sch' is the lookup name for the SPI validator and following the
>> '=' is an argument which points to the schematron to use.
>>
>> There are other argument configurations that will need discussed.
>>
>>
>> Unknowns
>> ---
>>
>> - How to approach breaking changes in the Validator API
>> - How to evolve serialized API objects to prevent breakage in existing
>> serialized objects (specifically from daffodil.api.ValidationMode)
>> - Is there a better overall approach to this  :P
>>
>>
>>
>> 1. https://docs.oracle.com/javase/tutorial/ext/basics/spi.html
>> 2. https://github.com/ctc-oss/incubator-daffodil
>> 3. https://github.com/ctc-oss/daffodil-schematron-validator
>>
>>
>>
>> --
>> John Wass
>> Software Engineer
>> Concurrent Technologies Corporation
>>
>>
>> -----------------------------------------------------------------
>> This message and any files transmitted within are intended solely for the
>> addressee or its representative and may contain company proprietary
>> information.  If you are not the intended recipient, notify the sender
>> immediately and delete this message.  Publication, reproduction,
>> forwarding, or content disclosure is prohibited without the consent of the
>> original sender and may be unlawful.
>>
>> Concurrent Technologies Corporation and its Affiliates.
>> www.ctc.com<http://www.ctc.com>  1-800-282-4392
>> -----------------------------------------------------------------
>>
>>

Reply via email to