Re: categorizing export tests

Guenter Milde Thu, 26 Nov 2015 14:19:40 -0800

On 2015-11-26, Kornel Benko wrote:
> Am 26. November 2015 um 11:23:46, schrieb Guenter Milde <mi...@users.sf.net>


>> The following proposal for an export test case categorisation tries to
>> avoid the controversial terms "inverted/reverted", "suspended", and
>> "ignored".

>> Instead, the basic distinction is between "good" tests and "known problems".

>> While the concept of "known problems" matches roughly to "inverted", there
>> are some differences:

>> * tests with "known problems" usually fail, but may also pass.

>> * a line 

>>     KNOWN_PROBLEM.<subtag>.export/...

>>   is easier to understand than

>>     INVERTED-SEE-README.export/...

> Hm, yes. But the first entries are not in any subcategory (no subtag).

Even without subtag, 

  KNOWN_PROBLEM.export/...

>>   is easier to understand than

>>     INVERTED-SEE-README.export/...

;-)

...

>> * There is no need for a top-level category "unreliable".

> I added it to please you ... :(

Ach so. I thought it was to allow an easier description:

\begin_layout Description
-nonstandard In primary sense such test means "requires non-standard resources
- (LaTeX packages and document classes, fonts, ...
- that are not a requirement for running this test suite".
+nonstandard Requires non-standard resources (LaTeX packages and document
+ classes, fonts, ...) that are not a requirement for running this test suite.
 \end_layout
 
 \begin_deeper
 \begin_layout Standard
-In a wider sense, it is currently used also for "not to be expected to succeed
- on every site that runs this test suite".
- This wider definition includes tests that have "arbitrary" result depending
- on local configuration, OS, TeX distribution, package versions, or the
- phase of the moon.
-\end_layout
-
-\begin_layout Standard
 These tests are labelled as 
 \family typewriter
 
 
...

Unreliable test cases are test cases with a known problem. The correct,
full hierarchy would be

  * known problems
    ...
    - unreliable
      · nonstandard
      · erratic
    
If we do not want 3 levels with subsubcategories, we can just remove the
level "unreliable" add its subcategories below "known_problems":

  * known problems
    ...
    - nonstandard
    - erratic
    
or use "unreliable" on the same level as "known problems":
  
  * known problems
    ...
  * unreliable
    - nonstandard
    - erratic

whatever suits you more.    

  

>> Export Test Categorisation
>> --------------------------

>> To get a feel for the severity of a known problem, it makes sense to 
>> sort known problems in sub-categories, e.g.


>> * TODO            # problems we want to solve but currently cannot.

>> * minor           # problems that may be eventually solved

>> * wontfix         # LyX problems with cases so special we decided to 
>>                   # leave them, or LaTeX problems that 
>>                   # - can't be solved due to systematic limitations, or
>>                   # - are bugs in "historic" packages no one works on.

>> * wrong output    # the output is corrupt, LyX should rise an error
>>                   # but export returns success.

>> * LaTeX bug       # problems due to LaTeX packages or other "external"
>>                   # reasons (someone else's problems).
>>                   # that may be eventually solved
>>                   # (In this case, the case goes to "unreliable" until
>>                   # everyone has the version with the fix.)

>> * nonstandard     # requires packages or other resources that are not on CTAN
>>                   # (some developers may have them installed)

>> * erratic         # depending on local configuration, OS, TeX distribution,
>>                   # package versions, or the phase of the moon.


> Feels good, but who shall categorize?

This will be a collaborative work.  Normally, this would be done when
addressing a new "known problem".

But first we need to agree on and set up the framework.

Proposal
========

* Rename "inverted" to "known_problem" and the file
  autotests/revertedTests to autotests/problematicTests.
  
  - in test mode (looking for regressions), the result of these test
    cases is irrelevant
    
  - in maintenance mode, the label should be removed from test cases
    that pass.

  This means that `ctest -L export` should not run tests with
  "known_problems".
    
  Running `ctest` (without -L) should rather list the failing tests with
  "known problems" than the passing ones. This is less confusing.
  
  Motivation: In "test mode",
    · a test that fails to fail is no problem (we search regressions),
    · a test that fails for a "known reason" is recognised as such by its
      label and can be ignored by the user or a post-processing script.
  
  (BTW: in the list of failing tests recently sent by Scott, there were a
  number of "INVERTED_SEE-README" tests. Does this mean these tests failed
  or does it mean these tests failed to fail?)

* Handle "unreliable" test cases similar to "known problems": create test
  instances with a telling label (unless they are wontfix):

  - in test mode (looking for regressions), the result of these test
    cases is irrelevant
    
  - in maintenance mode, the label should be removed from test cases
    that pass everywhere and every time. 
    
  However:
    The label "unreliable" is an indicator that the test is not "good" if
    it passes at one site or only one time. 
    To remove this label, you need confirmation from other developers
    that the problem is really solved.
    
    
* Rename autotests/ignoredTests to autotests/wontfixTests and move
  "wontfix" problems there.

* Rename autotests/suspendedTests to autotests/fragileTests.

  Use this label for all fragile tests, not only inverted ones. 
  
> The problem is the huge number of tests which do not fail. They are not
> categorized ATM.

The idea here is a file autotests/fragileTests with "wide" regular
expressions, e.g.

   .*pdf4SystemF
   .*Math.*

This should apply to all tests, not only the ones with "known problems".   

Then, when there is a "regression" in one of the fragile tests, it will
be shown with the "moderating" label "fragile" telling that the reason is
more likely a surfacing problem with the document or export format than a
new one.



>> If we want to make sure that no "good fail" is transformed to a
>> "wrong output" we would need a category "assert fail" and report
>> export without failure:

>> * assert fail     # we know the export does not work for a permanent reason
>>                   # and want to test whether LyX correctly fails
>>                   # (e.g. pdflatex with a package requiring LuaTeX)


> That is for later, used in autotests/export.
> All other lyx-files (but attic) are distributed. Normally we expect
> them in good shape.

Yes.


Günter

Re: categorizing export tests

Reply via email to