Re: Daffodil-1300

Beckerle, Mike Thu, 08 Oct 2020 11:03:16 -0700

I've depended, since the earliest days of Daffodil, on being able to click on 
one of the


    def test_foo() = { Runner.runOneTest("foo") }

And do "debug test" and have it run that single test in the breakpoint debugger.
I do this dozens of times a day when working on Daffodil using both eclipse, 
and using Intellij IDEA.  This is all leveraging IDE support for JUnit.

So whatever we do, that has to remain very convenient.

The redundancy of having the name of the test twice is annoying.  This would be 
better:

   @Test def foo() = r.run1

Where run1 with no arguments instructs it to derive the test name from the 
method name. That would require naming conventions for tests to be scala 
identifier-friendly, or mappable.

That's as small as I can make it knowing it would remain JUnit friendly.

I'm not sure the method name is available to a method by means other than using 
debugger/backtrace kinds of tricks.


________________________________
From: Sloane, Brandon <bslo...@owlcyberdefense.com>
Sent: Thursday, October 8, 2020 12:30 PM
To: dev@daffodil.apache.org <dev@daffodil.apache.org>
Subject: Re: Daffodil-1300

I've used dynamic tests in a couple of schema projects I was working on.

My recollection is that the IDE integration made it difficult to run a single 
test, as you first need to generate all the tests; and there was not a good way 
to cache the results.

In the case of Daffodil, we also take advantage of the fact that we can comment 
out specifc tests. I don't think dynamic tests have a good way of doing that 
(although we can probably implement a test blacklist ourselves).

Another approach I have taken that seems to work better is to have a code 
generator produce the boilerplate test classes. We still need to figure out how 
to persistently disable tests, but this should avoid IDE integration issues.
________________________________
From: Steve Lawrence <slawre...@apache.org>
Sent: Thursday, October 8, 2020 11:28 AM
To: dev@daffodil.apache.org <dev@daffodil.apache.org>
Subject: Re: Daffodil-1300

Skimming the Runner code, I'm not entirely convinced this is all working
as expected. But I think Ian is looking into that as part of this bug.
Seems like a reasonable time to confirm the complexity of our TDML
runner and all the caching is actually working right before updating all
the tests to use Runners.

That said, I'm wondering if maybe there is a better approach to our
testing infrastructure. I wonder if all of our boilerplate like this
could be removed:

  @Test test_foo = { runner.runOneTest("foo") }

This is all pretty much boilerplate.

It seems like it should be possible to have some sort of JUnit TestSuite
factory that simply scans for all the .tdml files, creates a unique
Runner for each tdml file (thus avoiding any duplication issues) and
then dynamically creates tests for all the parser/unparserTestCases. All
this scala stuff feels unnecessary to me.

Optional provided to the Runner (e.g. validation) can become attributes
on the tdml:testSuite attribute.

And we already have the "implementation" attribute that says which
implementations a tdml:parser/unparserTestCase works with. So rather
than commenting out tests in the scala file, we can just set
implementation="" to state it doesn't work with daffodil.

Though, I'm not sure if JUnit actually has a concept of dynamic test
suites (it does have dynamic tests), and how nicely that plays with
IDEs. I feel like we also shouldn't necessarily be tied to JUnit. While
maybe preferred, I would be willing to switch to something else if it
has this sort of capability and works with IDEs.



On 10/8/20 10:59 AM, Beckerle, Mike wrote:
> The point of runners and the object constructing them, was to share the 
> XML-loading of the TDML, and computation of all the test suite objects from 
> it, which was otherwise being loaded repeatedly and computed repeatedly.
>
> It is sensible to question this. There's a bunch of requirements we may have 
> lost sight of, so let me see if I can review some of them here.
>
>   *    Some TDML files are *huge* and it is quite slow if the TDML gets 
> loaded (as XML) multiple times. Clearly a TDML file can contain multiple 
> tests and the work to load, parse, compile schemas in, and prepare for tests 
> should happen exactly once per TDML file, regardless of how test drivers are 
> used to invoke the individual tests.
>      *   Decomposing these into smaller files is desirable, but shouldn't be 
> necessary.
>   *   Test drivers (the scala junit calls) that enable easy testing of a 
> single test of a TDML file are essential to easy debugging. They must work 
> from sbt and from IDEs
>   *   JUnit is the preferred idiom, not scalatest
>   *   Boilerplate in the scala test driver code is to be minimized. Nothing 
> that doesn't have to be expressed should be expressed, management of objects 
> and their lifetimes shouldn't be required. (This argues that the shutdown 
> code shouldn't be required) The only things that should have to be specified 
> are:
>      *   tdml file name and path
>      *   Options such as whether to validate the TDML or not, schema or not
>      *   for each junit test, the name of the test that appears in the TDML 
> file.
>
>   *   Tests are generally run on multiple threads. Not just whole TDML suites 
> at a time, but even within one TDML file, many Junit test drivers get kicked 
> off simultaneously on multiple threads. This parallelism is needed to make 
> the test suite run in a reasonable amount of time.
>      *   This cannot construct new copies of the entire DFDLTestSuite or the 
> overhead would be too high.
>   *   The volume of tests and duration of the runs requires that test data 
> structures such as the DFDLTestSuite objects be incrementally garbage 
> collected.
>
> Requirements on the TDML runner (DFDLTestSuite and its methods) to behave 
> properly and be coded properly may not be properly articulated.
>
> The DFDLTestSuite, and any objects it creates such as ParserTestCase and 
> UnparserTestCase, should have state members only for digesting the definition 
> of the test suite and test definitions. (i.e., should be all vals, or lazy 
> vals, not vars, and they should not compute anything that comes from running 
> the test, only from the test definitions.) They should be stateless otherwise.
>
> I'm not at all sure this is the case.
>
> There are multiple, possibly overlapping, ad-hoc fixes to these issues.
>
> There's a schema cache which tries to avoid recompiling the same DFDL schema 
> over and over for various tests defined in the same TDML file. This seems to 
> be needed because tests may use the same DFDL schema and same root element, 
> but this isn't apparent except by examining the test case specifically. So 
> some sort of cache is needed, as the same schema with different config 
> options may be used.
>
> The Runner object is effectively trying to be a cache of the DFDLTestSuite 
> object so that these are not constructed multiple times.
>
>
>
>
> ________________________________
> From: Steve Lawrence <slawre...@apache.org>
> Sent: Wednesday, October 7, 2020 6:17 PM
> To: dev@daffodil.apache.org <dev@daffodil.apache.org>
> Subject: Re: Daffodil-1300
>
> So why do we recommend initialize the runners in a singleton object and
> having custom logic to clean them up when the test finishes? Why not
> just put them in the class and let them be cleaned up by the GC when the
> test finishes?
>
> On 10/7/20 6:11 PM, Beckerle, Mike wrote:
>> The Runner objects are normally members of a scala singleton object. That 
>> way they are constructed once and shared by all the tests.
>>
>> My understanding is once a singleton object is required, it is constructed, 
>> and then never goes away for the life of the JVM (actually the class loader, 
>> but those aren't released typically either, so life of the JVM) (per 
>> https://stackoverflow.com/questions/3956652/when-are-scala-objects-garbage-collected)
>>
>> So as Runners are usually val or lazy val of these singletons, they will 
>> never go out of scope.
>>
>>
>>
>>
>>
>>
>>
>>
>> ________________________________
>> From: Carlson, Ian <icarl...@owlcyberdefense.com>
>> Sent: Wednesday, October 7, 2020 5:51 PM
>> To: dev@daffodil.apache.org <dev@daffodil.apache.org>
>> Subject: RE: Daffodil-1300
>>
>>
>> A conversation with Steve Lawrence, and digging deeper into the ThreadLocal 
>> aspect of the Runner and DFDLTestSuite objects has called into question 
>> whether the shutdown routines are still needed or not. My understanding is 
>> that Scala is a garbage collected language, and should dump objects and 
>> classes which are no longer in use – so long as references to them are not 
>> being held by still-existing objects.
>>
>>
>>
>> The shutdown routine called by the Runner class simply invokes 
>> ts_ts.set(null), where tl_ts is of type ThreadLocal[DFDLTestSuite]. 
>> Presumably this removes the reference to the DFDLTestSuite and allows it to 
>> be garbage collected, but in current cases, the Runner  containing the 
>> DFDLTestSuite will simply go out of scope once the block of tests is 
>> completed.
>>
>>
>>
>> The Runner objects are constructed as lazy val types, so it is my 
>> expectation that they will be constructed only when used – as in the 
>> TresysTests.scala file – lines 108-181 – and then be released at the end of 
>> that scope.
>>
>>
>>
>> So is the invocation of .set(null) on the ThreadLocal[DFDLTestSuite] objects 
>> still actually necessary?
>>
>>
>>
>> [A picture containing object, clock  Description automatically generated]    
>>       Ian Carlson | Software Engineer
>>
>> [Owl Cyber Defense]
>>
>> W  icarl...@owlcyberdefense.com<https://owlcyberdefense.com/>
>>
>> Connect with us!
>>
>> Facebook<https://www.facebook.com/owlcyberdefense/> | 
>> LinkedIn<https://www.linkedin.com/company/owlcyberdefense/> | 
>> Twitter<https://twitter.com/owlcyberdefense>
>>
>>
>>
>> [Find us at our next event. Click 
>> Here.]<https://owlcyberdefense.com/resources/events/?utm_source=owl-cyber-defense&utm_medium=email&utm_content=banner&utm_campaign=2020-events>
>>
>>
>>
>> The information contained in this transmission is for the personal and 
>> confidential use of the individual or entity to which it is addressed.
>>
>> If the reader is not the intended recipient, you are hereby notified that 
>> any review, dissemination, or copying of this communication is strictly 
>> prohibited.
>>
>> If you have received this transmission in error, please notify the sender 
>> immediately
>>
>>
>>
>>
>>
>> From: Beckerle, Mike<mailto:mbecke...@owlcyberdefense.com>
>> Sent: Thursday, October 1, 2020 4:03 PM
>> To: dev@daffodil.apache.org<mailto:dev@daffodil.apache.org>
>> Subject: Re: Daffodil-1300
>>
>>
>>
>> I think the DFDLTestSuite constructor can't stay deprecated. We still need 
>> to construct them.
>>
>> It can become package private I think, so not usable outside the package.  I 
>> think the scala syntax for that is:
>>
>> class Foo private[pkgName] (arg1: Type1, arg2: Type2) { ...
>> }
>>
>> The deprecation was just to help in hunting down the usages.
>>
>> If we need more Runner functions that's ok. But probably we can just depend 
>> on Scala's ability to pass arguments by specifying the argument names 
>> explicitly and getting defaults for the rest.  I think calls to Runner 
>> already do this in many cases.
>>
>>
>>
>>
>>
>> ________________________________
>> From: Carlson, Ian <icarl...@owlcyberdefense.com>
>> Sent: Thursday, October 1, 2020 4:14 PM
>> To: dev@daffodil.apache.org <dev@daffodil.apache.org>
>> Subject: Daffodil-1300
>>
>>
>> Regarding DAFFODIL-1300, I’m hunting down usage of “new DFDLTestSuite” and 
>> replacing them with usage of the “Runner” object. In order to help identify 
>> where the DFDLTestSuite was being invoked, I un-commented:
>>
>>
>>
>> daffodil-tdml-lib/src/main/scala/org/apache/daffodil/tdml/TDMLRunner.scala 
>> line 198
>>
>> @deprecated("2016-12-30", "Use Runner(...) instead."
>>
>>
>>
>> This revealed that the Runner object itself invokes “new DFDLTestSuite” when 
>> the runOneTest function is called. This seems to defeat the purpose of this 
>> task. I’m hoping for some guidance or opinions on when, if ever, it’s 
>> appropriate to invoke the DFDLTestSuite constructor.
>>
>>
>>
>> Related to this – I believe I’ll need to create more constructors for the 
>> Runner object in order to preserve the argument combinations currently 
>> invoked by some of the new DFDLTestSuite calls – in particular when 
>> compileAllTopLevel must be provided as true when paired with an xml node 
>> instead of a file name.
>>
>>
>>
>> Is deprecating the DFDLTestSuite constructor still appropriate, or since 
>> this ticket was written have we changed or strategy such that it shouldn’t 
>> be called directly, but will need to be used by the Runner object?
>>
>>
>>
>> [A picture containing object, clock  Description automatically generated]    
>>       Ian Carlson | Software Engineer
>>
>> [Owl Cyber Defense]
>>
>> W  icarl...@owlcyberdefense.com<https://owlcyberdefense.com/>
>>
>> Connect with us!
>>
>> Facebook<https://www.facebook.com/owlcyberdefense/> | 
>> LinkedIn<https://www.linkedin.com/company/owlcyberdefense/> | 
>> Twitter<https://twitter.com/owlcyberdefense>
>>
>>
>>
>> [Find us at our next event. Click 
>> Here.]<https://owlcyberdefense.com/resources/events/?utm_source=owl-cyber-defense&utm_medium=email&utm_content=banner&utm_campaign=2020-events>
>>
>>
>>
>> The information contained in this transmission is for the personal and 
>> confidential use of the individual or entity to which it is addressed.
>>
>> If the reader is not the intended recipient, you are hereby notified that 
>> any review, dissemination, or copying of this communication is strictly 
>> prohibited.
>>
>> If you have received this transmission in error, please notify the sender 
>> immediately
>>
>>
>>
>>
>>
>>
>
>

Re: Daffodil-1300

Reply via email to