On 10/13/2011 01:24 PM, Thomas Sewell wrote:
Good day all. Just wanted to let the Isabelle developers know about the latest 
feature David Matthews has added to PolyML, and to let you all know how useful 
it is.

The feature allows profiling of objects after garbage collection. When code is 
compiled with PolyML.compiler.allocationProfiling set to 1, all objects 
allocated are also given a pointer to their allocating function. When the 
garbage collector runs with PolyML.compiler.profiling set to 4, a statistical 
trace is printed of which objects survived garbage collection.

Cool.

So we have:
profiling = 1 approximates runtime Monte-Carlo style sampling of the program counter profiling = 2 records the number of words allocated in each function (very accurate IIRC)
  profiling = 3 ???
  profiling = 4 counts GC survivors (very accurately?)


This means that for Isabelle we get our first real look at what is taking up 
space at runtime and in images.

I include the last 20 lines of traces produced at four interesting points
   1) After including the HOL theories in HOL's ROOT.ML
   2) After performing a PolyML.shareCommonData after (1)
   3) After adding a 400-element record to a test theory built on the HOL image.
   4) After performing a shareCommonData after (3)

These are traces for profiling=4?

Isabelle is generating a *lot* of copies of types&  terms, particularly via
Term.map_atyps. Since shareCommonData eliminates them, many are
duplicates. It's possible that further use of the "same" variants from
Term_Subst might help. It's also possible that the repeated reconstruction is
necessary (e.g. repeated generalization/abstraction of type variables) and
further use of the new Term_Sharing mechanism might be the answer.

The way I learned Isabelle programming one views term-traversal as being cheap, which seems to be true most of the time especially when they are freshly allocated with nice locality properties. Sharing lots of subterms might interfere with this.
Isn't this what GC was made for? Why introduce artificial sharing?

BTW: the Coq kernel does huge amounts of sharing IIRC.
Should we be concerned or is this just a thing because of proof terms?

Makarius, please comment on this, because now I feel like a wasteful
programmer. ;D


A large quantity of the persistent objects are Table and Net objects, as
expected.

There are surprisingly many dummy tasks.

What is a dummy task?

A surprisingly large quantity of the persistent objects are associated with
proof terms and name derivations. This is presumably not Thm.name_derivation
but inlined code from its subcalls Proofterm.thm_proof, proof_combP,
proof_combt' and Library.foldl, none of which are listed. If indeed these foldl
loops are producing this many objects then perhaps the work done unconditionally
here should be rethought.

For proofs=0?
Taking a guess these might be the PBody thms pointers.

Cheers,
  Andy


_______________________________________________
isabelle-dev mailing list
isabelle-...@in.tum.de
https://mailmanbroy.informatik.tu-muenchen.de/mailman/listinfo/isabelle-dev

Reply via email to