Re: Daffodil-1300

Steve Lawrence Fri, 09 Oct 2020 05:07:37 -0700

Fair enough, sounds like we can't really get rid of hardcoded scala
tests if some depend on that ability.


The run1 is an interesting idea. It solves the problem I had some times
where I copy an old test and change the function name but not the string
name. And does avoid boilerplate. Looks like there is a way to get test
name information from junit, but it involves even more boilerplate.
Though, I don't think stack trace is awful

I think it's just a one liner to get the function that called runOne

  def runOne: Unit = {
    val testName = Thread.currentThread.getStackTrace()(2).getMethodName
    runOneTest(testName)
  }

Not sure if there's a performance impact, I believe getStackTrace is
relatively slow. But I imagine it's nothing compared to what an
individual test does.

Another option might be via scala macros, but that wouldn't help anyone
running from java, and I'm not even positive if it could be made to work.

As far Brandon's generator suggestion, one issue I see is how to
distribute that so that it's easily runnable. Ideally this same process
would be used for DFDL schema repos as well as Daffodil. Presumably it
would by a class with a Main in daffodil-tdml-lib, and users would need
to run it via the command line after the dependency is downloaded? Seems
unintutive for new users. Perhaps an DFDL Schema project sbt plugin is a
way to automate that? Also depends on a certain project layout, which
I'm not sure many projects follow.

I think using implemtation="" would be a good way to define which tests
work and which don't. Then you don't even need to comment out tests.
Those tests just get skipped when run.

On 10/8/20 2:02 PM, Beckerle, Mike wrote:
> I've depended, since the earliest days of Daffodil, on being able to click on 
> one of the
> 
>     def test_foo() = { Runner.runOneTest("foo") }
> 
> And do "debug test" and have it run that single test in the breakpoint 
> debugger.
> I do this dozens of times a day when working on Daffodil using both eclipse, 
> and using Intellij IDEA.  This is all leveraging IDE support for JUnit.
> 
> So whatever we do, that has to remain very convenient.
> 
> The redundancy of having the name of the test twice is annoying.  This would 
> be better:
> 
>    @Test def foo() = r.run1
> 
> Where run1 with no arguments instructs it to derive the test name from the 
> method name. That would require naming conventions for tests to be scala 
> identifier-friendly, or mappable.
> 
> That's as small as I can make it knowing it would remain JUnit friendly.
> 
> I'm not sure the method name is available to a method by means other than 
> using debugger/backtrace kinds of tricks.
> 
> 
> ________________________________
> From: Sloane, Brandon <bslo...@owlcyberdefense.com>
> Sent: Thursday, October 8, 2020 12:30 PM
> To: dev@daffodil.apache.org <dev@daffodil.apache.org>
> Subject: Re: Daffodil-1300
> 
> I've used dynamic tests in a couple of schema projects I was working on.
> 
> My recollection is that the IDE integration made it difficult to run a single 
> test, as you first need to generate all the tests; and there was not a good 
> way to cache the results.
> 
> In the case of Daffodil, we also take advantage of the fact that we can 
> comment out specifc tests. I don't think dynamic tests have a good way of 
> doing that (although we can probably implement a test blacklist ourselves).
> 
> Another approach I have taken that seems to work better is to have a code 
> generator produce the boilerplate test classes. We still need to figure out 
> how to persistently disable tests, but this should avoid IDE integration 
> issues.
> ________________________________
> From: Steve Lawrence <slawre...@apache.org>
> Sent: Thursday, October 8, 2020 11:28 AM
> To: dev@daffodil.apache.org <dev@daffodil.apache.org>
> Subject: Re: Daffodil-1300
> 
> Skimming the Runner code, I'm not entirely convinced this is all working
> as expected. But I think Ian is looking into that as part of this bug.
> Seems like a reasonable time to confirm the complexity of our TDML
> runner and all the caching is actually working right before updating all
> the tests to use Runners.
> 
> That said, I'm wondering if maybe there is a better approach to our
> testing infrastructure. I wonder if all of our boilerplate like this
> could be removed:
> 
>   @Test test_foo = { runner.runOneTest("foo") }
> 
> This is all pretty much boilerplate.
> 
> It seems like it should be possible to have some sort of JUnit TestSuite
> factory that simply scans for all the .tdml files, creates a unique
> Runner for each tdml file (thus avoiding any duplication issues) and
> then dynamically creates tests for all the parser/unparserTestCases. All
> this scala stuff feels unnecessary to me.
> 
> Optional provided to the Runner (e.g. validation) can become attributes
> on the tdml:testSuite attribute.
> 
> And we already have the "implementation" attribute that says which
> implementations a tdml:parser/unparserTestCase works with. So rather
> than commenting out tests in the scala file, we can just set
> implementation="" to state it doesn't work with daffodil.
> 
> Though, I'm not sure if JUnit actually has a concept of dynamic test
> suites (it does have dynamic tests), and how nicely that plays with
> IDEs. I feel like we also shouldn't necessarily be tied to JUnit. While
> maybe preferred, I would be willing to switch to something else if it
> has this sort of capability and works with IDEs.
> 
> 
> 
> On 10/8/20 10:59 AM, Beckerle, Mike wrote:
>> The point of runners and the object constructing them, was to share the 
>> XML-loading of the TDML, and computation of all the test suite objects from 
>> it, which was otherwise being loaded repeatedly and computed repeatedly.
>>
>> It is sensible to question this. There's a bunch of requirements we may have 
>> lost sight of, so let me see if I can review some of them here.
>>
>>   *    Some TDML files are *huge* and it is quite slow if the TDML gets 
>> loaded (as XML) multiple times. Clearly a TDML file can contain multiple 
>> tests and the work to load, parse, compile schemas in, and prepare for tests 
>> should happen exactly once per TDML file, regardless of how test drivers are 
>> used to invoke the individual tests.
>>      *   Decomposing these into smaller files is desirable, but shouldn't be 
>> necessary.
>>   *   Test drivers (the scala junit calls) that enable easy testing of a 
>> single test of a TDML file are essential to easy debugging. They must work 
>> from sbt and from IDEs
>>   *   JUnit is the preferred idiom, not scalatest
>>   *   Boilerplate in the scala test driver code is to be minimized. Nothing 
>> that doesn't have to be expressed should be expressed, management of objects 
>> and their lifetimes shouldn't be required. (This argues that the shutdown 
>> code shouldn't be required) The only things that should have to be specified 
>> are:
>>      *   tdml file name and path
>>      *   Options such as whether to validate the TDML or not, schema or not
>>      *   for each junit test, the name of the test that appears in the TDML 
>> file.
>>
>>   *   Tests are generally run on multiple threads. Not just whole TDML 
>> suites at a time, but even within one TDML file, many Junit test drivers get 
>> kicked off simultaneously on multiple threads. This parallelism is needed to 
>> make the test suite run in a reasonable amount of time.
>>      *   This cannot construct new copies of the entire DFDLTestSuite or the 
>> overhead would be too high.
>>   *   The volume of tests and duration of the runs requires that test data 
>> structures such as the DFDLTestSuite objects be incrementally garbage 
>> collected.
>>
>> Requirements on the TDML runner (DFDLTestSuite and its methods) to behave 
>> properly and be coded properly may not be properly articulated.
>>
>> The DFDLTestSuite, and any objects it creates such as ParserTestCase and 
>> UnparserTestCase, should have state members only for digesting the 
>> definition of the test suite and test definitions. (i.e., should be all 
>> vals, or lazy vals, not vars, and they should not compute anything that 
>> comes from running the test, only from the test definitions.) They should be 
>> stateless otherwise.
>>
>> I'm not at all sure this is the case.
>>
>> There are multiple, possibly overlapping, ad-hoc fixes to these issues.
>>
>> There's a schema cache which tries to avoid recompiling the same DFDL schema 
>> over and over for various tests defined in the same TDML file. This seems to 
>> be needed because tests may use the same DFDL schema and same root element, 
>> but this isn't apparent except by examining the test case specifically. So 
>> some sort of cache is needed, as the same schema with different config 
>> options may be used.
>>
>> The Runner object is effectively trying to be a cache of the DFDLTestSuite 
>> object so that these are not constructed multiple times.
>>
>>
>>
>>
>> ________________________________
>> From: Steve Lawrence <slawre...@apache.org>
>> Sent: Wednesday, October 7, 2020 6:17 PM
>> To: dev@daffodil.apache.org <dev@daffodil.apache.org>
>> Subject: Re: Daffodil-1300
>>
>> So why do we recommend initialize the runners in a singleton object and
>> having custom logic to clean them up when the test finishes? Why not
>> just put them in the class and let them be cleaned up by the GC when the
>> test finishes?
>>
>> On 10/7/20 6:11 PM, Beckerle, Mike wrote:
>>> The Runner objects are normally members of a scala singleton object. That 
>>> way they are constructed once and shared by all the tests.
>>>
>>> My understanding is once a singleton object is required, it is constructed, 
>>> and then never goes away for the life of the JVM (actually the class 
>>> loader, but those aren't released typically either, so life of the JVM) 
>>> (per 
>>> https://stackoverflow.com/questions/3956652/when-are-scala-objects-garbage-collected)
>>>
>>> So as Runners are usually val or lazy val of these singletons, they will 
>>> never go out of scope.
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> ________________________________
>>> From: Carlson, Ian <icarl...@owlcyberdefense.com>
>>> Sent: Wednesday, October 7, 2020 5:51 PM
>>> To: dev@daffodil.apache.org <dev@daffodil.apache.org>
>>> Subject: RE: Daffodil-1300
>>>
>>>
>>> A conversation with Steve Lawrence, and digging deeper into the ThreadLocal 
>>> aspect of the Runner and DFDLTestSuite objects has called into question 
>>> whether the shutdown routines are still needed or not. My understanding is 
>>> that Scala is a garbage collected language, and should dump objects and 
>>> classes which are no longer in use – so long as references to them are not 
>>> being held by still-existing objects.
>>>
>>>
>>>
>>> The shutdown routine called by the Runner class simply invokes 
>>> ts_ts.set(null), where tl_ts is of type ThreadLocal[DFDLTestSuite]. 
>>> Presumably this removes the reference to the DFDLTestSuite and allows it to 
>>> be garbage collected, but in current cases, the Runner  containing the 
>>> DFDLTestSuite will simply go out of scope once the block of tests is 
>>> completed.
>>>
>>>
>>>
>>> The Runner objects are constructed as lazy val types, so it is my 
>>> expectation that they will be constructed only when used – as in the 
>>> TresysTests.scala file – lines 108-181 – and then be released at the end of 
>>> that scope.
>>>
>>>
>>>
>>> So is the invocation of .set(null) on the ThreadLocal[DFDLTestSuite] 
>>> objects still actually necessary?
>>>
>>>
>>>
>>> [A picture containing object, clock  Description automatically generated]   
>>>        Ian Carlson | Software Engineer
>>>
>>> [Owl Cyber Defense]
>>>
>>> W  icarl...@owlcyberdefense.com<https://owlcyberdefense.com/>
>>>
>>> Connect with us!
>>>
>>> Facebook<https://www.facebook.com/owlcyberdefense/> | 
>>> LinkedIn<https://www.linkedin.com/company/owlcyberdefense/> | 
>>> Twitter<https://twitter.com/owlcyberdefense>
>>>
>>>
>>>
>>> [Find us at our next event. Click 
>>> Here.]<https://owlcyberdefense.com/resources/events/?utm_source=owl-cyber-defense&utm_medium=email&utm_content=banner&utm_campaign=2020-events>
>>>
>>>
>>>
>>> The information contained in this transmission is for the personal and 
>>> confidential use of the individual or entity to which it is addressed.
>>>
>>> If the reader is not the intended recipient, you are hereby notified that 
>>> any review, dissemination, or copying of this communication is strictly 
>>> prohibited.
>>>
>>> If you have received this transmission in error, please notify the sender 
>>> immediately
>>>
>>>
>>>
>>>
>>>
>>> From: Beckerle, Mike<mailto:mbecke...@owlcyberdefense.com>
>>> Sent: Thursday, October 1, 2020 4:03 PM
>>> To: dev@daffodil.apache.org<mailto:dev@daffodil.apache.org>
>>> Subject: Re: Daffodil-1300
>>>
>>>
>>>
>>> I think the DFDLTestSuite constructor can't stay deprecated. We still need 
>>> to construct them.
>>>
>>> It can become package private I think, so not usable outside the package.  
>>> I think the scala syntax for that is:
>>>
>>> class Foo private[pkgName] (arg1: Type1, arg2: Type2) { ...
>>> }
>>>
>>> The deprecation was just to help in hunting down the usages.
>>>
>>> If we need more Runner functions that's ok. But probably we can just depend 
>>> on Scala's ability to pass arguments by specifying the argument names 
>>> explicitly and getting defaults for the rest.  I think calls to Runner 
>>> already do this in many cases.
>>>
>>>
>>>
>>>
>>>
>>> ________________________________
>>> From: Carlson, Ian <icarl...@owlcyberdefense.com>
>>> Sent: Thursday, October 1, 2020 4:14 PM
>>> To: dev@daffodil.apache.org <dev@daffodil.apache.org>
>>> Subject: Daffodil-1300
>>>
>>>
>>> Regarding DAFFODIL-1300, I’m hunting down usage of “new DFDLTestSuite” and 
>>> replacing them with usage of the “Runner” object. In order to help identify 
>>> where the DFDLTestSuite was being invoked, I un-commented:
>>>
>>>
>>>
>>> daffodil-tdml-lib/src/main/scala/org/apache/daffodil/tdml/TDMLRunner.scala 
>>> line 198
>>>
>>> @deprecated("2016-12-30", "Use Runner(...) instead."
>>>
>>>
>>>
>>> This revealed that the Runner object itself invokes “new DFDLTestSuite” 
>>> when the runOneTest function is called. This seems to defeat the purpose of 
>>> this task. I’m hoping for some guidance or opinions on when, if ever, it’s 
>>> appropriate to invoke the DFDLTestSuite constructor.
>>>
>>>
>>>
>>> Related to this – I believe I’ll need to create more constructors for the 
>>> Runner object in order to preserve the argument combinations currently 
>>> invoked by some of the new DFDLTestSuite calls – in particular when 
>>> compileAllTopLevel must be provided as true when paired with an xml node 
>>> instead of a file name.
>>>
>>>
>>>
>>> Is deprecating the DFDLTestSuite constructor still appropriate, or since 
>>> this ticket was written have we changed or strategy such that it shouldn’t 
>>> be called directly, but will need to be used by the Runner object?
>>>
>>>
>>>
>>> [A picture containing object, clock  Description automatically generated]   
>>>        Ian Carlson | Software Engineer
>>>
>>> [Owl Cyber Defense]
>>>
>>> W  icarl...@owlcyberdefense.com<https://owlcyberdefense.com/>
>>>
>>> Connect with us!
>>>
>>> Facebook<https://www.facebook.com/owlcyberdefense/> | 
>>> LinkedIn<https://www.linkedin.com/company/owlcyberdefense/> | 
>>> Twitter<https://twitter.com/owlcyberdefense>
>>>
>>>
>>>
>>> [Find us at our next event. Click 
>>> Here.]<https://owlcyberdefense.com/resources/events/?utm_source=owl-cyber-defense&utm_medium=email&utm_content=banner&utm_campaign=2020-events>
>>>
>>>
>>>
>>> The information contained in this transmission is for the personal and 
>>> confidential use of the individual or entity to which it is addressed.
>>>
>>> If the reader is not the intended recipient, you are hereby notified that 
>>> any review, dissemination, or copying of this communication is strictly 
>>> prohibited.
>>>
>>> If you have received this transmission in error, please notify the sender 
>>> immediately
>>>
>>>
>>>
>>>
>>>
>>>
>>
>>
> 
>

Re: Daffodil-1300

Reply via email to