Re: [isabelle-dev] HOL-Quickcheck_Benchmark timeouts

Makarius Thu, 29 Nov 2012 13:22:14 -0800

On Thu, 29 Nov 2012, Makarius wrote:

I've also started some bisection about where the problem comes from,without any clear results so far, but I will report more a bit later.

I've spent a few more hours on the problem, repeating the bisectionseveral times, and staring add various changesets that are not at allclear from their text.



This is the result:

50e457bbc5fe bulwahn GOOD
6a26fed71755 bulwahn SKIP (BAD)
28cd3c9ca278 bulwahn SKIP (BAD)
fb696ff1f086 bulwahn BAD

Due to skipped revisions, the first bad revision could be any of:
changeset:   49943:6a26fed71755
user:        bulwahn
date:        Sat Oct 20 09:09:32 2012 +0200
summary:     passing names and types of all bounds around in the simproc

changeset:   49944:28cd3c9ca278
user:        bulwahn
date:        Sat Oct 20 09:09:33 2012 +0200

summary: tuned tactic in set_comprehension_pointfree simproc to handlecomposition of negation and vimage


changeset:   49945:fb696ff1f086
user:        bulwahn
date:        Sat Oct 20 09:09:34 2012 +0200
summary:     adjusting proofs

And here some explanation reconstructed from the investigations with someguesswork, glossing over the the unexplained things in these changesets.


(1) BAD means the following crash:

theory A imports Main
begin

find_unused_assms Fun

Warning - Unable to increase stack - interrupting thread

Exception trace for exception - Interrupt
Generated_Code.value(1)(1)(1)(1)(2)
Generated_Code.check_all_subsets(4)(1)
Exhaustive_Generators.compile_generator_expr(2)compile-(1)(1)(1)(1)(1)
Exhaustive_Generators.compile_generator_expr(2)(1)(1)
Quickcheck_Common.test_term_with_cardinality(5)test_card_size(4)
Quickcheck_Common.test_term_with_cardinality(5)test(2)
Quickcheck_Common.test_term_with_cardinality(5)
Quickcheck_Common.generator_test_goal_terms(5)

(2) SKIP means these changesets were broken: HOL did not compile.Backporting the "adjusting proofs" changeset fb696ff1f086, revealed thatthey were BAD, too.

Note that it does not make any sense to publish changesets where Pure orHOL do not compile. There is no obligation to have all session OK all thetime before pushing, but the usefulness of changesets is greatlydiminished. Someone else needs to spend much more time in exchange forthe 2-3 minutes saved in not running HOL on the spot, before saying "hgcommit". (I usually do a full "isabelle build -a" before *any* commit.)And of course, there is an obligation to run full "isabelle build -a" (ora proven equivalent) before any push.

(3) Looking at the critical change 6a26fed71755 "passing names and typesof all bounds around in the simproc" several times, I tried to understandwhy it affects the generated quickcheck code in such a bad way. (Thechangelog merely repeats the diff in English prose as a "parrot".)

So after backing-out this single-line change from 5a1194acbecd of today,everything worked fine except forsrc/HOL/ex/Set_Comprehension_Pointfree_Tests.thy crashing as follows:


*** Wellsortedness error
*** (in code equation Set_Comprehension_Pointfree_Tests.union ?a ?b == {x. x : 
?a | x : ?b}):
*** Type nat not of sort enum
*** No type arity nat :: enum
*** At command "export_code" (line 134 of 
"~~/src/HOL/ex/Set_Comprehension_Pointfree_Tests.thy")

Looking in the vicinity of the many other changes related to the criticalthree above reveals the following:


changeset:   49947:29cd291bfea6
user:        bulwahn
date:        Sat Oct 20 09:09:37 2012 +0200
files:       src/HOL/ex/Set_Comprehension_Pointfree_Tests.thy
description:

adding another test case for the set_comprehension_simproc to the theoryin HOL/ex

So here one could guess from the greater context that 6a26fed71755with its SKIP (BAD) status was motivated by a problem with passing sortinformation down to the generated code. Since this is the "exhaustive"quickcheck tester, having "nat :: enum" or not could make a difference inthe amount of enumeration done here.

So "BAD" might actually mean "better" in the sense of more exhaustive, butit breaks down the concrete application of find_unused_assms nonetheless.

Another side remark: Isabelle does not have a tradition for "test cases"and "unit tests". What we usually made in all these decades were somehalf-decent applications to explore tools or some stylized examples toshow the main points. These then also serve as a tests somehow. (There issometimes the odd situation that some manual or "test" theory is the onlyspot where certain features of certain tools occur, which means there islack of proof for practical relevance. I've seen this again just today ina different situation that is unrelated to this incident.)

Anyway, maybe Lukas himself or codegen export Florian knows how to resolvethe situation.

Once the HOL-Quickcheck-Benchmark session is up and running again, I wouldlike to apply some trivial changes to make it properly run in parallel.Then it should become part of the normal build, not under conditionISABELLE_FULL_TEST. This saves time in manuel re-testing along thehistory, which mira is better at.



        Makarius

_______________________________________________
isabelle-dev mailing list
isabelle-...@in.tum.de
https://mailmanbroy.informatik.tu-muenchen.de/mailman/listinfo/isabelle-dev

Re: [isabelle-dev] HOL-Quickcheck_Benchmark timeouts

Reply via email to