A fully functional API is probably good to have, and I'd like to define that 
also and cut over to using that within Daffodil.

We would need a dataProcessor.reset() method that creates a new dp which has 
had all external vars, tunables, basically any of those formerly settable 
things, forgotten so you can start clean.

(Would clean() be a better name than reset() ?)

This reset() subsumes my clone() method idea.

I want to deprecate (literally with "@deprecated" annotations) the setters 
generally, but my compromise is once you parse/unparse the object becomes 
immutable permanently. So the setters are just a different way to parameterize 
the object.

The other thing that is impacted by this change it the OpenDFDL/ibmCrossTester 
that implements these same APIs but drives IBM DFDL with them.  That's ok for 
now. I can retrofit that later. It will still work, just get deprecation 
warnings.

________________________________
From: Steve Lawrence <stephen.d.lawre...@gmail.com>
Sent: Wednesday, March 25, 2020 12:05 PM
To: dev@daffodil.apache.org <dev@daffodil.apache.org>
Subject: Re: Compiler.setExternalDFDLVariable(s) considered challenged

This feels like the right approach. Thoughts on instead of having to
manually clone() and having some sort of locking mechanism when parse is
called, we just make a DataProcessor completely immutable, and setting a
variable will implicitly create a copy with modified state, something like:

val dp = dataProcessor
           .withTunable("tunableFoo", "1")
           .withVariable("variableFoo", 2")
           .withVariable("variableBar", 2")

It's not as java-y, but feels safer. I can imagine a case where someone
sets a bunch of state, calls parse and locks things, and then way down
the line they get a "data processor is locked" error because they tried
to change some state.


On 3/25/20 11:50 AM, Beckerle, Mike wrote:
> Here's my concrete proposal to fix DataProcesor's API.
>
> We add a clone() method. It copies the object sharing the big/expensive stuff 
> like the compiled schema but gives you a separate copy of all the state that 
> is settable via setValidationMode, setExternalDFDLVariable, and setTunable.
>
> So you get a DataProcessor either from a ProcessorFactory, or from 
> Compiler.reload().
>
> You set state.
> You can call parse/unparse on various threads
>
> If you want to change state settings for another parse call, you must call 
> clone() to get another instance of the DataProcessor. This will share the 
> expensive/big stuff like the compiled schema, but give you a separate copy of 
> all the state that is settable via setValidationMode, 
> setExternalDFDLVariable, and setTunable.
>
> Then you set state
> You can call parse/unparse on various threads.
>
> I hate to get into locking and such, but once you call parse/unparse, that 
> should lock the object state, and any further setter calls should fail/throw. 
> You should have to clone it to "change" the settings.
>
> This makes the use of the setters a java-ish way of achieving the same 
> reentrancy one would get if there were no setters and simply more arguments 
> to the parse/unparse calls.
>
> ________________________________
> From: Beckerle, Mike <mbecke...@tresys.com>
> Sent: Wednesday, March 25, 2020 11:29 AM
> To: dev@daffodil.apache.org <dev@daffodil.apache.org>
> Subject: Re: Compiler.setExternalDFDLVariable(s) considered challenged
>
> To respond to my own thread here, I think given that we allow multiple 
> simultaneous calls to parse/unparse from different threads, we must say that 
> the DataProcessor object is immutable once parse or unparse are called.
>
> I suppose we could say that it is mutable, but behavior is undetermined if 
> any parse or unparse calls are active on any thread.
>
> But this is just asking for trouble IMHO.
>
> I think we started out with a stateful, non-thread capable API. The idea is 
> that one thread would be invoking a data processor at a time. A data procesor 
> was the state-block of an execution.
>
> The need to share compiled processor reloads, because the compile schemas 
> were expensive to create, tempted us to allow multiple parse/unparse calls on 
> different threads.
>
> Fact is, I think we should have said no to this, provided a 
> DataProcessor.clone() to create instances that shared the reloaded compiled 
> schema binary, but otherwise had separate state, and said that parse/unparse 
> were synchronized methods on their DP instance.
>
> Instead we're in a 1/2 way world where we don't have a thread-reasonable API 
> due to mutable state in what turns out to be a cross-thread shared object.
>
> ________________________________
> From: Beckerle, Mike <mbecke...@tresys.com>
> Sent: Wednesday, March 25, 2020 10:55 AM
> To: dev@daffodil.apache.org <dev@daffodil.apache.org>
> Subject: Compiler.setExternalDFDLVariable(s) considered challenged
>
> Why does the API for Daffodil have
>
> Compiler.setExternalDFDLVariable(...)
> and
> Compiler.setExternalDFDLVariables(...)
>
> on it.
>
> I believe we should deprecate this.
>
> Compilers are parameterized by some of the tunables I understand.
>
> But the external DFDL variables? These cannot affect compilation. The schema 
> compiler needs to know statically the information about variables found in 
> the schema itself in the dfdl:defineVariable statement annotations.
>
> But the compiler doesn't need external variable bindings. In fact if it did 
> know and use them, it would be building assumptions into the compiled schema 
> that it shouldn't be building in.
>
> Setting external var bindings on the Compiler just causes problems when that 
> compiler instance is reused in a context where those settings aren't 
> appropriate. (JIRA DAFFODIL-2302 is one such problem)
>
> I believe setExternalDFDLVariable(s) methods should be deprecated, and 
> external variables bindings should be an optional argument to the 
> parse/unparse methods.
>
> The setters cause thread safety issues because the DP is stateful, even 
> though we want multiple calls to parse/unparse to be executable on different 
> threads.
>
> Consider: if we allow ordinary setExternalDFDLVariables and add a 
> resetExternalDFDLVariables to clear them, then imagine one wants to make two 
> parse calls on separate threads with different external variables bindings:
>
> so on main thread.....
>      dp.setExternalVariables(...bindings 1...)
>      spawn the thread 1
> on the thread 1
>      dp.parse(....)
> back on main thread
>      dp.resetExternalVariables() // race condition. Did the parse call read 
> the external variables before this reset or not?
>      dp.setExternalVariables(...bindings 2....)
>     .....
>
> However, if we make the external variable bindings an argument to parse, we 
> avoid all of this.
>
> Alternatively, since DataProcessor has setExternalDFDLVariable, we can 
> prohibit multiple calls on the same DataProcessor object simultaneously. We 
> can provide a clone() method that preserves the loaded/reloaded processor, 
> but constructs another DataProcessor object, thereby allowing separate 
> external variables state per DataProcessor instance.
>
>
> Comments?
>
>
>
>
>
> Mike Beckerle | Principal Engineer
> [Owl Cyber Defense]<http://owlcyberdefense.com>
> [cid:2a423dec-0558-414e-b369-14a24becc40f] is now a part of 
> Owl<https://owlcyberdefense.com/news/owl-cyber-defense-tresys-technology-announce-merger/>
> P +1-781-330-0412
> W owlcyberdefense.com<http://www.owlcyberdefense.com>
>

Reply via email to