Here's my concrete proposal to fix DataProcesor's API. We add a clone() method. It copies the object sharing the big/expensive stuff like the compiled schema but gives you a separate copy of all the state that is settable via setValidationMode, setExternalDFDLVariable, and setTunable.
So you get a DataProcessor either from a ProcessorFactory, or from Compiler.reload(). You set state. You can call parse/unparse on various threads If you want to change state settings for another parse call, you must call clone() to get another instance of the DataProcessor. This will share the expensive/big stuff like the compiled schema, but give you a separate copy of all the state that is settable via setValidationMode, setExternalDFDLVariable, and setTunable. Then you set state You can call parse/unparse on various threads. I hate to get into locking and such, but once you call parse/unparse, that should lock the object state, and any further setter calls should fail/throw. You should have to clone it to "change" the settings. This makes the use of the setters a java-ish way of achieving the same reentrancy one would get if there were no setters and simply more arguments to the parse/unparse calls. ________________________________ From: Beckerle, Mike <mbecke...@tresys.com> Sent: Wednesday, March 25, 2020 11:29 AM To: dev@daffodil.apache.org <dev@daffodil.apache.org> Subject: Re: Compiler.setExternalDFDLVariable(s) considered challenged To respond to my own thread here, I think given that we allow multiple simultaneous calls to parse/unparse from different threads, we must say that the DataProcessor object is immutable once parse or unparse are called. I suppose we could say that it is mutable, but behavior is undetermined if any parse or unparse calls are active on any thread. But this is just asking for trouble IMHO. I think we started out with a stateful, non-thread capable API. The idea is that one thread would be invoking a data processor at a time. A data procesor was the state-block of an execution. The need to share compiled processor reloads, because the compile schemas were expensive to create, tempted us to allow multiple parse/unparse calls on different threads. Fact is, I think we should have said no to this, provided a DataProcessor.clone() to create instances that shared the reloaded compiled schema binary, but otherwise had separate state, and said that parse/unparse were synchronized methods on their DP instance. Instead we're in a 1/2 way world where we don't have a thread-reasonable API due to mutable state in what turns out to be a cross-thread shared object. ________________________________ From: Beckerle, Mike <mbecke...@tresys.com> Sent: Wednesday, March 25, 2020 10:55 AM To: dev@daffodil.apache.org <dev@daffodil.apache.org> Subject: Compiler.setExternalDFDLVariable(s) considered challenged Why does the API for Daffodil have Compiler.setExternalDFDLVariable(...) and Compiler.setExternalDFDLVariables(...) on it. I believe we should deprecate this. Compilers are parameterized by some of the tunables I understand. But the external DFDL variables? These cannot affect compilation. The schema compiler needs to know statically the information about variables found in the schema itself in the dfdl:defineVariable statement annotations. But the compiler doesn't need external variable bindings. In fact if it did know and use them, it would be building assumptions into the compiled schema that it shouldn't be building in. Setting external var bindings on the Compiler just causes problems when that compiler instance is reused in a context where those settings aren't appropriate. (JIRA DAFFODIL-2302 is one such problem) I believe setExternalDFDLVariable(s) methods should be deprecated, and external variables bindings should be an optional argument to the parse/unparse methods. The setters cause thread safety issues because the DP is stateful, even though we want multiple calls to parse/unparse to be executable on different threads. Consider: if we allow ordinary setExternalDFDLVariables and add a resetExternalDFDLVariables to clear them, then imagine one wants to make two parse calls on separate threads with different external variables bindings: so on main thread..... dp.setExternalVariables(...bindings 1...) spawn the thread 1 on the thread 1 dp.parse(....) back on main thread dp.resetExternalVariables() // race condition. Did the parse call read the external variables before this reset or not? dp.setExternalVariables(...bindings 2....) ..... However, if we make the external variable bindings an argument to parse, we avoid all of this. Alternatively, since DataProcessor has setExternalDFDLVariable, we can prohibit multiple calls on the same DataProcessor object simultaneously. We can provide a clone() method that preserves the loaded/reloaded processor, but constructs another DataProcessor object, thereby allowing separate external variables state per DataProcessor instance. Comments? Mike Beckerle | Principal Engineer [Owl Cyber Defense]<http://owlcyberdefense.com> [cid:2a423dec-0558-414e-b369-14a24becc40f] is now a part of Owl<https://owlcyberdefense.com/news/owl-cyber-defense-tresys-technology-announce-merger/> P +1-781-330-0412 W owlcyberdefense.com<http://www.owlcyberdefense.com>