Re: Daffodil-1300

Beckerle, Mike Thu, 08 Oct 2020 08:00:02 -0700

The point of runners and the object constructing them, was to share the 
XML-loading of the TDML, and computation of all the test suite objects from it, 
which was otherwise being loaded repeatedly and computed repeatedly.


It is sensible to question this. There's a bunch of requirements we may have 
lost sight of, so let me see if I can review some of them here.

  *    Some TDML files are *huge* and it is quite slow if the TDML gets loaded 
(as XML) multiple times. Clearly a TDML file can contain multiple tests and the 
work to load, parse, compile schemas in, and prepare for tests should happen 
exactly once per TDML file, regardless of how test drivers are used to invoke 
the individual tests.
     *   Decomposing these into smaller files is desirable, but shouldn't be 
necessary.
  *   Test drivers (the scala junit calls) that enable easy testing of a single 
test of a TDML file are essential to easy debugging. They must work from sbt 
and from IDEs
  *   JUnit is the preferred idiom, not scalatest
  *   Boilerplate in the scala test driver code is to be minimized. Nothing 
that doesn't have to be expressed should be expressed, management of objects 
and their lifetimes shouldn't be required. (This argues that the shutdown code 
shouldn't be required) The only things that should have to be specified are:
     *   tdml file name and path
     *   Options such as whether to validate the TDML or not, schema or not
     *   for each junit test, the name of the test that appears in the TDML 
file.

  *   Tests are generally run on multiple threads. Not just whole TDML suites 
at a time, but even within one TDML file, many Junit test drivers get kicked 
off simultaneously on multiple threads. This parallelism is needed to make the 
test suite run in a reasonable amount of time.
     *   This cannot construct new copies of the entire DFDLTestSuite or the 
overhead would be too high.
  *   The volume of tests and duration of the runs requires that test data 
structures such as the DFDLTestSuite objects be incrementally garbage collected.

Requirements on the TDML runner (DFDLTestSuite and its methods) to behave 
properly and be coded properly may not be properly articulated.

The DFDLTestSuite, and any objects it creates such as ParserTestCase and 
UnparserTestCase, should have state members only for digesting the definition 
of the test suite and test definitions. (i.e., should be all vals, or lazy 
vals, not vars, and they should not compute anything that comes from running 
the test, only from the test definitions.) They should be stateless otherwise.

I'm not at all sure this is the case.

There are multiple, possibly overlapping, ad-hoc fixes to these issues.

There's a schema cache which tries to avoid recompiling the same DFDL schema 
over and over for various tests defined in the same TDML file. This seems to be 
needed because tests may use the same DFDL schema and same root element, but 
this isn't apparent except by examining the test case specifically. So some 
sort of cache is needed, as the same schema with different config options may 
be used.

The Runner object is effectively trying to be a cache of the DFDLTestSuite 
object so that these are not constructed multiple times.




________________________________
From: Steve Lawrence <slawre...@apache.org>
Sent: Wednesday, October 7, 2020 6:17 PM
To: dev@daffodil.apache.org <dev@daffodil.apache.org>
Subject: Re: Daffodil-1300

So why do we recommend initialize the runners in a singleton object and
having custom logic to clean them up when the test finishes? Why not
just put them in the class and let them be cleaned up by the GC when the
test finishes?

On 10/7/20 6:11 PM, Beckerle, Mike wrote:
> The Runner objects are normally members of a scala singleton object. That way 
> they are constructed once and shared by all the tests.
>
> My understanding is once a singleton object is required, it is constructed, 
> and then never goes away for the life of the JVM (actually the class loader, 
> but those aren't released typically either, so life of the JVM) (per 
> https://stackoverflow.com/questions/3956652/when-are-scala-objects-garbage-collected)
>
> So as Runners are usually val or lazy val of these singletons, they will 
> never go out of scope.
>
>
>
>
>
>
>
>
> ________________________________
> From: Carlson, Ian <icarl...@owlcyberdefense.com>
> Sent: Wednesday, October 7, 2020 5:51 PM
> To: dev@daffodil.apache.org <dev@daffodil.apache.org>
> Subject: RE: Daffodil-1300
>
>
> A conversation with Steve Lawrence, and digging deeper into the ThreadLocal 
> aspect of the Runner and DFDLTestSuite objects has called into question 
> whether the shutdown routines are still needed or not. My understanding is 
> that Scala is a garbage collected language, and should dump objects and 
> classes which are no longer in use – so long as references to them are not 
> being held by still-existing objects.
>
>
>
> The shutdown routine called by the Runner class simply invokes 
> ts_ts.set(null), where tl_ts is of type ThreadLocal[DFDLTestSuite]. 
> Presumably this removes the reference to the DFDLTestSuite and allows it to 
> be garbage collected, but in current cases, the Runner  containing the 
> DFDLTestSuite will simply go out of scope once the block of tests is 
> completed.
>
>
>
> The Runner objects are constructed as lazy val types, so it is my expectation 
> that they will be constructed only when used – as in the TresysTests.scala 
> file – lines 108-181 – and then be released at the end of that scope.
>
>
>
> So is the invocation of .set(null) on the ThreadLocal[DFDLTestSuite] objects 
> still actually necessary?
>
>
>
> [A picture containing object, clock  Description automatically generated]     
>      Ian Carlson | Software Engineer
>
> [Owl Cyber Defense]
>
> W  icarl...@owlcyberdefense.com<https://owlcyberdefense.com/>
>
> Connect with us!
>
> Facebook<https://www.facebook.com/owlcyberdefense/> | 
> LinkedIn<https://www.linkedin.com/company/owlcyberdefense/> | 
> Twitter<https://twitter.com/owlcyberdefense>
>
>
>
> [Find us at our next event. Click 
> Here.]<https://owlcyberdefense.com/resources/events/?utm_source=owl-cyber-defense&utm_medium=email&utm_content=banner&utm_campaign=2020-events>
>
>
>
> The information contained in this transmission is for the personal and 
> confidential use of the individual or entity to which it is addressed.
>
> If the reader is not the intended recipient, you are hereby notified that any 
> review, dissemination, or copying of this communication is strictly 
> prohibited.
>
> If you have received this transmission in error, please notify the sender 
> immediately
>
>
>
>
>
> From: Beckerle, Mike<mailto:mbecke...@owlcyberdefense.com>
> Sent: Thursday, October 1, 2020 4:03 PM
> To: dev@daffodil.apache.org<mailto:dev@daffodil.apache.org>
> Subject: Re: Daffodil-1300
>
>
>
> I think the DFDLTestSuite constructor can't stay deprecated. We still need to 
> construct them.
>
> It can become package private I think, so not usable outside the package.  I 
> think the scala syntax for that is:
>
> class Foo private[pkgName] (arg1: Type1, arg2: Type2) { ...
> }
>
> The deprecation was just to help in hunting down the usages.
>
> If we need more Runner functions that's ok. But probably we can just depend 
> on Scala's ability to pass arguments by specifying the argument names 
> explicitly and getting defaults for the rest.  I think calls to Runner 
> already do this in many cases.
>
>
>
>
>
> ________________________________
> From: Carlson, Ian <icarl...@owlcyberdefense.com>
> Sent: Thursday, October 1, 2020 4:14 PM
> To: dev@daffodil.apache.org <dev@daffodil.apache.org>
> Subject: Daffodil-1300
>
>
> Regarding DAFFODIL-1300, I’m hunting down usage of “new DFDLTestSuite” and 
> replacing them with usage of the “Runner” object. In order to help identify 
> where the DFDLTestSuite was being invoked, I un-commented:
>
>
>
> daffodil-tdml-lib/src/main/scala/org/apache/daffodil/tdml/TDMLRunner.scala 
> line 198
>
> @deprecated("2016-12-30", "Use Runner(...) instead."
>
>
>
> This revealed that the Runner object itself invokes “new DFDLTestSuite” when 
> the runOneTest function is called. This seems to defeat the purpose of this 
> task. I’m hoping for some guidance or opinions on when, if ever, it’s 
> appropriate to invoke the DFDLTestSuite constructor.
>
>
>
> Related to this – I believe I’ll need to create more constructors for the 
> Runner object in order to preserve the argument combinations currently 
> invoked by some of the new DFDLTestSuite calls – in particular when 
> compileAllTopLevel must be provided as true when paired with an xml node 
> instead of a file name.
>
>
>
> Is deprecating the DFDLTestSuite constructor still appropriate, or since this 
> ticket was written have we changed or strategy such that it shouldn’t be 
> called directly, but will need to be used by the Runner object?
>
>
>
> [A picture containing object, clock  Description automatically generated]     
>      Ian Carlson | Software Engineer
>
> [Owl Cyber Defense]
>
> W  icarl...@owlcyberdefense.com<https://owlcyberdefense.com/>
>
> Connect with us!
>
> Facebook<https://www.facebook.com/owlcyberdefense/> | 
> LinkedIn<https://www.linkedin.com/company/owlcyberdefense/> | 
> Twitter<https://twitter.com/owlcyberdefense>
>
>
>
> [Find us at our next event. Click 
> Here.]<https://owlcyberdefense.com/resources/events/?utm_source=owl-cyber-defense&utm_medium=email&utm_content=banner&utm_campaign=2020-events>
>
>
>
> The information contained in this transmission is for the personal and 
> confidential use of the individual or entity to which it is addressed.
>
> If the reader is not the intended recipient, you are hereby notified that any 
> review, dissemination, or copying of this communication is strictly 
> prohibited.
>
> If you have received this transmission in error, please notify the sender 
> immediately
>
>
>
>
>
>

Re: Daffodil-1300

Reply via email to