FYI: John Wass - I am also going to surf your code a bit so may have more 
comments. I've fetched from your ctc-oss repositories.

One thing the UDF code did work through is how to define a SPI-based feature 
for Daffodil and also include test-specific instances of it and test them, all 
in the daffodil source tree.

________________________________
From: Beckerle, Mike <mbecke...@owlcyberdefense.com>
Sent: Thursday, October 1, 2020 5:52 PM
To: dev@daffodil.apache.org <dev@daffodil.apache.org>
Subject: Re: Validator SPI proposal

A few thoughts on top of John Interrante's review.

The validator code/clases being found via SPI seems good. Sharing code/library 
with the existing usage for UDFs would be nice if it works out.

The validator code reads in various "specs".

For XML Schema validation with xerces, this is the XML Schema (which is also 
the DFDL schema).

For Schematron validation, I know some people have asked for the ability to 
express the schematron rules on the DFDL schema as added schema annotation 
elements, positioning them on elements and having the "." path expression refer 
to the element corresonding to the element declaration upon which the rule is 
placed. They end up looking somewhat like DFDL's assertions, but the schematron 
rules use full XPath, and so can do somewhat more, and they are operating on 
the XML Infoset, not the DFDL Infoset.

But regardless of whether the schematron rules are extracted from the DFDL 
schema or from another file, the schematron validator, just like xerces, 
effectively has to compile that "spec" information into an internal data 
structure that enables fast validation.

So a requirement is that this happens once only, at startup time regardless of 
how many times parse/unparse are called.

Ideally, one would be able to serialize the result of this compilation i.e., 
save and serialize the validator so that it needn't be recompiled at all if 
reloaded. If this compiled validator is serializable, then just making that 
value a member of the SchemaSetRuntimeData class should do it, as that object 
and all its members get serialized now.

So if possible the validator API should accommodate this compile/save/reload 
cycle.

btw: daffodil has validation options for parse, but not for unparse currently. 
It should have the option to validate the incoming infoset before unparsing as 
well.

Re: Your "unknowns"

     - How to approach breaking changes in the Validator API

This is a general issue with Daffodil APIs. I think we have previously adopted 
a posture of that we would support API change by retaining existing but 
deprecated APIs for a release or two before phasing them out. We try to sort 
these out in design discussions of APIs or in Pull-Request reviews that have 
API changes in them.

- How to evolve serialized API objects to prevent breakage in existing 
serialized objects (specifically from daffodil.api.ValidationMode)

We have heretofore punted this in Daffodil generally. Saved parser/unparsers 
are version specific. If we want to fix this we should use a general approach 
for all Daffodil's serialized objects.  You are proposing to change 
ValidationMode so that really, it's not an enum any more, it can use 
identifiers that are pulled from classpath/SPI objects found.

In that case the code that uses ValidationMode  will have to change to use 
something more general. Probably ValidationMode itself has to go away as a 
concept replaced by a ValidatorSpec class which can be constructed from a 
string.

Then maybe ValidationMode.on isn't an enum at all any more, but a method that 
returns a singleton ValidationSpec for the xerces built in validator?

- Is there a better overall approach to this  :P

Gotta start somewhere.

-mikeb

________________________________
From: Wass, John L <wa...@ctc.com>
Sent: Wednesday, September 30, 2020 3:19 PM
To: dev@daffodil.apache.org <dev@daffodil.apache.org>
Subject: Re: Validator SPI proposal

Thanks for the review John.

> Then how do you combine your forked Daffodil and sample schemetron 
> implementation/application together so that your simplest usage example 
> actually works?

Sure.  The missing instructions are below and they were also added to the 
readme in the sample app repo.

---

1. From the root of daffodil; stage the cli package
`sbt daffodil-cli/universal:stage`

2. From `daffodil-cli/target/universal/stage`; run the application, verifying 
it fails as expected due to missing schematron validator jar
`./bin/daffodil parse --schema $data_dir/bmp.dfdl.xsd --validate 
sch=$data_dir/bmp.sch $data_dir/MARBLES.BMP`
Should result in
`[error] Bad arguments for option 'validate': 'sch=/sample/data/bmp.sch' - 
Unrecognized ValidationMode sch=/sample/data/bmp.sch.  Must be 'on', 'limited', 
'off', or name of spi validator.`

3. From the root of schematron validator; create an assembly jar
`sbt assembly`

4. From `daffodil-schematron-validator/target/scala-2.12`; copy the validator 
jar to the staged daffodil-cli application lib dir 
`daffodil-cli/target/universal/stage/lib`

5. From `daffodil-cli/target/universal/stage`; run the application
`./bin/daffodil parse --schema $data_dir/bmp.dfdl.xsd --validate 
sch=$data_dir/bmp.sch $data_dir/MARBLES.BMP`

6. See the parsed BMP with schematron validation status dumped to stdout.


Note the exported path to the schematron validator data dir as `data_dir` in 
the examples above.


---


> I'd have to do some research before I could say something about your bullet 
> items for discussion:

Thanks. I am actively thinking about these and welcome input.  Will follow up 
with additional thoughts...


> FYI, Daffodil already uses the ServiceLoader API to load user defined 
> functions (daffodil-udf)

Excellent. I did search for SPI related things when first looking at this but 
somehow missed that implementation.  I'll review it.


Appreciate the feedback, and looking forward to hearing if the app runs for you 
:)


john


________________________________
From: Interrante, John A (GE Research, US) <inter...@research.ge.com>
Sent: Tuesday, September 29, 2020 5:34 PM
To: dev@daffodil.apache.org
Subject: RE: Validator SPI proposal

Hello John,

Using ServiceLoader looks reasonable.  I looked at your reference 
implementation and sample application, but can you clarify a question for me?  
First you build your forked Daffodil and your sample application separately in 
different directories.   Then how do you combine your forked Daffodil and 
sample schemetron implementation/application together so that your simplest 
usage example actually works?  That is, do you need to do step 1 below?

1.  Copy a jar from daffodil-schematron-validator/target/... to 
incubator-daffodil/daffodil-cli/target/universal/stage/lib?
        $ <please fill in this step>
2.  Define an alias (or create a symbolic link) to allow you to run your 
freshly built daffodil executable?
        $ alias 
daffodil="$HOME/incubator-daffodil/daffodil-cli/target/universal/stage/bin/daffodil"
3.  Run your simplest usage example?
        $ cd daffodil-schematron-validator
        $ daffodil parse --schema data/bmp.dfdl.xsd --validate sch=data/bmp.sch 
data/MARBLES.BMP

I'd have to do some research before I could say something about your bullet 
items for discussion:

- How to approach breaking changes in the Validator API
- How to evolve serialized API objects to prevent breakage in existing 
serialized objects (specifically from daffodil.api.ValidationMode)
- Is there a better overall approach to this

FYI, Daffodil already uses the ServiceLoader API to load user defined functions 
(daffodil-udf).  I don't know much about the UDF files; I found them only 
because I searched for any occurrences of ServiceLoader in Daffodil.  I don't 
know if you have seen these files and whether any of them informed your 
implementation, but I'll append a list of the UDF files for you to look at.

interran@GH3WPL13E:~/apache/incubator-daffodil-asf$ fd udf
daffodil-cli/src/it/scala/org/apache/daffodil/udf
daffodil-cli/src/it/scala/org/apache/daffodil/udf/TestCLIUdfs.scala
daffodil-runtime1/src/main/scala/org/apache/daffodil/udf
daffodil-test/src/test/resources/META-INF/services/org.apache.daffodil.udf.UserDefinedFunctionProvider
daffodil-test/src/test/resources/org/apache/daffodil/udf
daffodil-test/src/test/resources/org/apache/daffodil/udf/udfs.tdml
daffodil-test/src/test/scala/org/apache/daffodil/udf
daffodil-test/src/test/scala/org/apache/daffodil/udf/TestUdfsInSchemas.scala
daffodil-udf
daffodil-udf/src/main/java/org/apache/daffodil/udf
daffodil-udf/src/test/java/org/badudfs
daffodil-udf/src/test/java/org/badudfs/annotations/StringFunctions/META-INF/services/org.apache.daffodil.udf.UserDefinedFunctionProvider
daffodil-udf/src/test/java/org/badudfs/evaluate/StringFunctions/META-INF/services/org.apache.daffodil.udf.UserDefinedFunctionProvider
daffodil-udf/src/test/java/org/badudfs/functionclasses1/StringFunctions/META-INF/services/org.apache.daffodil.udf.UserDefinedFunctionProvider
daffodil-udf/src/test/java/org/badudfs/functionclasses2/StringFunctions/META-INF/services/org.apache.daffodil.udf.UserDefinedFunctionProvider
daffodil-udf/src/test/java/org/badudfs/nonUDF
daffodil-udf/src/test/java/org/badudfs/nonUDF/StringFunctions/META-INF/services/org.apache.daffodil.udf.UserDefinedFunctionProvider
daffodil-udf/src/test/java/org/jgoodudfs
daffodil-udf/src/test/resources/org/apache/daffodil/udf
daffodil-udf/src/test/resources/org/apache/daffodil/udf/genericUdfSchema.xsd
daffodil-udf/src/test/resources/org/badmetainf/nonexistentclass/META-INF/services/org.apache.daffodil.udf.UserDefinedFunctionProvider
daffodil-udf/src/test/resources/org/goodmetainf/IntegerFunctions/META-INF/services/org.apache.daffodil.udf.UserDefinedFunctionProvider
daffodil-udf/src/test/resources/org/goodmetainf/StringFunctions/META-INF/services/org.apache.daffodil.udf.UserDefinedFunctionProvider
daffodil-udf/src/test/scala/org/sbadudfs
daffodil-udf/src/test/scala/org/sbadudfs/functionclasses/StringFunctions/META-INF/services/org.apache.daffodil.udf.UserDefinedFunctionProvider
daffodil-udf/src/test/scala/org/sbadudfs/functionclasses2/StringFunctions/META-INF/services/org.apache.daffodil.udf.UserDefinedFunctionProvider
daffodil-udf/src/test/scala/org/sbadudfs/udfexceptions
daffodil-udf/src/test/scala/org/sbadudfs/udfexceptions/evaluating/StringFunctions/META-INF/services/org.apache.daffodil.udf.UserDefinedFunctionProvider
daffodil-udf/src/test/scala/org/sbadudfs/udfexceptions2
daffodil-udf/src/test/scala/org/sbadudfs/udfexceptions2/StringFunctions/META-INF/services/org.apache.daffodil.udf.UserDefinedFunctionProvider
daffodil-udf/src/test/scala/org/sbadudfs/udfpexceptions
daffodil-udf/src/test/scala/org/sbadudfs/udfpexceptions/StringFunctions/META-INF/services/org.apache.daffodil.udf.UserDefinedFunctionProvider
daffodil-udf/src/test/scala/org/sbadudfs/udfpexceptions2
daffodil-udf/src/test/scala/org/sbadudfs/udfpexceptions2/StringFunctions/META-INF/services/org.apache.daffodil.udf.UserDefinedFunctionProvider
daffodil-udf/src/test/scala/org/sgoodudfs
interran@GH3WPL13E:~/apache/incubator-daffodil-asf$ rg ServiceLoader
daffodil-udf/README.md
36:This class will act as a traditional service provider as explained in the 
ServiceLoader API, and must have a 
*META-INF/services/org.apache.daffodil.udf.UserDefinedFunctionProvider* file in 
its project. This file must contain the fully qualified name(s) of the 
**provider class(es)** in the JAR. Without that file, neither this class nor 
any of the User Defined Function classes it provides will be visible to 
Daffodil.
86:Each UDF is registered by including the fully qualified name of its provider 
in a text file named 
`META-INF/services/org.apache.daffodil.udf.UserDefinedFunctionProvider`. The 
META-INF folder must be accessible from the root of whatever paths are on the 
classpath, otherwise it won't be picked up by ServiceLoader.

daffodil-udf/src/main/java/org/apache/daffodil/udf/UserDefinedFunctionProvider.java
21: * Abstract class used by ServiceLoader to poll for UDF providers on 
classpath.

daffodil-runtime1/src/main/scala/org/apache/daffodil/udf/UserDefinedFunctionService.scala
22:import java.util.ServiceLoader
83:    val loader: ServiceLoader[UserDefinedFunctionProvider] = 
ServiceLoader.load(classOf[UserDefinedFunctionProvider])
185:     * We catch any errors thrown by the ServiceLoader here. This usually 
means UDFP
interran@GH3WPL13E:~/apache/incubator-daffodil-asf$

John

-----Original Message-----
From: Wass, John L <wa...@ctc.com>
Sent: Tuesday, September 29, 2020 10:20 AM
To: dev@daffodil.apache.org
Subject: EXT: Validator SPI proposal

Greetings,

Please consider the following proposal to extend the Daffodil Infoset 
Validation API.  The proposed changes support deploying custom validation 
implementations that are not built as part of the Daffodil distribution but are 
instead made available at runtime as Java Service Provider Interface (SPI) [1] 
"plug-ins".

The intent here is to enable a wide range of validation approaches without 
increasing overhead for the Daffodil project, while increasing the velocity at 
which such implementations can be deployed.

To support the discussion there is a minimally functional reference 
implementation for Daffodil[2] and sample application using Schematron in a 
standalone project[3].

I look forward to discussing the approach in more detail.


Approach
---

1. Extract a Validator interface that describes validation behavior.
2. Detect implementations of this interface at runtime using SPI.
3. Parse additional validation arguments from CLI 4. Pass "Custom" validators 
through the existing api.ValidationMode.
5. Change ParseResult to execute validation through a SPI provided instance.

- Instances of the Validator are accessed at runtime using SPI metadata from 
META-INF.
- The existing Validator behavior remains and is installed as the "default" 
behavior.
- The current CLI arguments for validation would not change, but an extended 
set of parse patterns is added.


CLI Usage
---

In the Schematron sample application there are a few CLI patterns impemented 
for reference.

The simplest usage, using the BMP schema, is

`daffodil parse --schema data/bmp.dfdl.xsd --validate sch=data/bmp.sch 
data/MARBLES.BMP`

Where 'sch' is the lookup name for the SPI validator and following the '=' is 
an argument which points to the schematron to use.

There are other argument configurations that will need discussed.


Unknowns
---

- How to approach breaking changes in the Validator API
- How to evolve serialized API objects to prevent breakage in existing 
serialized objects (specifically from daffodil.api.ValidationMode)
- Is there a better overall approach to this  :P



1. https://docs.oracle.com/javase/tutorial/ext/basics/spi.html
2. https://github.com/ctc-oss/incubator-daffodil
3. https://github.com/ctc-oss/daffodil-schematron-validator



--
John Wass
Software Engineer
Concurrent Technologies Corporation


-----------------------------------------------------------------
This message and any files transmitted within are intended solely for the 
addressee or its representative and may contain company proprietary 
information.  If you are not the intended recipient, notify the sender 
immediately and delete this message.  Publication, reproduction, forwarding, or 
content disclosure is prohibited without the consent of the original sender and 
may be unlawful.

Concurrent Technologies Corporation and its Affiliates.
www.ctc.com<http://www.ctc.com>  1-800-282-4392
-----------------------------------------------------------------

Reply via email to