Re: [isabelle-dev] Using PolyML's memory consumption profiling on Isabelle
On Thu, 13 Oct 2011, Thomas Sewell wrote: 3) After adding a 400-element record to a test theory built on the HOL image. Your original motivation was to isolate mysterious performance issues in the record package. Did you now manage to pin that down in the measurement? In that case a bisection could reveal the point in history where the change happened, so one might learn from it another detail about the ML runtime-system. Makarius ___ isabelle-dev mailing list isabelle-...@in.tum.de https://mailmanbroy.informatik.tu-muenchen.de/mailman/listinfo/isabelle-dev
Re: [isabelle-dev] Using PolyML's memory consumption profiling on Isabelle
On Fri, 14 Oct 2011, Andreas Schropp wrote: Isabelle is generating a *lot* of copies of types& terms, particularly via Term.map_atyps. Since shareCommonData eliminates them, many are duplicates. It's possible that further use of the "same" variants from Term_Subst might help. It's also possible that the repeated reconstruction is necessary (e.g. repeated generalization/abstraction of type variables) and further use of the new Term_Sharing mechanism might be the answer. The way I learned Isabelle programming one views term-traversal as being cheap, which seems to be true most of the time especially when they are freshly allocated with nice locality properties. Sharing lots of subterms might interfere with this. Isn't this what GC was made for? Why introduce artificial sharing? There is a spectrum of things that can be done, and the various tendencies in optimization are often in conflict. Andreas is right about his intuition how operations in Isabelle are usually done. GC works pretty well and with an increasing tendency of multiple cores and distributed memory excessive sharing could become counter-productive. We used to have global sharing of term structure for results for the sake of SML/NJ, which lacks the convenience of share_common_data of Poly/ML. When the parallelization of Poly/ML and Isabelle/ML started around 2007, I removed that global resource for performance reasons. As a rule of thumb it is better to let parallel threads run freely and independently, even if they allocate redundant data -- in different regions of the heap. I reintroduced some bits of Term_Sharing recently for different reasons, to ensure that terms entering the system by the new protocol layers are "interned" in the way one normally expects. (The concrete syntax layer did this implicitly due to lookup of formaly entities.) The details of parallel / distributed memory management are a black art, at all levels (hardware, runtime-system, application code). David Matthews is the one who understands that best :-) BTW: the Coq kernel does huge amounts of sharing IIRC. Should we be concerned or is this just a thing because of proof terms? I can't say what Coq does exactly -- the general tendency there (and OCaml) seems to target hardware from the mid 1990-ies. They also have serious problems catching up with the multicore trend, but I am actually collaborating with some Coq experts on the parallel prover theme so some improvements in the general understanding can be anticipated. Makarius, please comment on this, because now I feel like a wasteful programmer. ;D I see no particular problems here. Just continue with a purely functional mindset, with a bit of a sense for the general workings of the runtime-system and underlying hardware. Makarius ___ isabelle-dev mailing list isabelle-...@in.tum.de https://mailmanbroy.informatik.tu-muenchen.de/mailman/listinfo/isabelle-dev
Re: [isabelle-dev] Using PolyML's memory consumption profiling on Isabelle
On Fri, 14 Oct 2011, David Matthews wrote: I can only guess that allocation of mutable stuff costs extra. Allocation of a mutable, at least a fixed-size mutable such as ref, doesn't cost any more than allocating an immutable. However, if a mutable survives a GC it has an impact on subsequent GCs. The worst case would be a mutable that survived one GC and then becomes unreachable. It would continue to be scanned in every partial GC until it was thrown away by the next full GC. Does this correspond with what you've found? Yes, I was thinking in terms of the survival of the mutable, not the initial allocation. What happened in the example is that any Future.value (which is conceptually immutable) would retain a mutable field for timing information that is reachable but semantically never used later on. Thus it somehow impacts later memory management indirectly, leading to the measurable (but very small) overhead. Makarius ___ isabelle-dev mailing list isabelle-...@in.tum.de https://mailmanbroy.informatik.tu-muenchen.de/mailman/listinfo/isabelle-dev
Re: [isabelle-dev] Using PolyML's memory consumption profiling on Isabelle
On 14/10/2011 10:56, Makarius wrote: On Fri, 14 Oct 2011, Andreas Schropp wrote: val dummy_task = Task(NONE, ~1) Values are not shared?! What the hell? Datatypes and tuples that contain only constant data are created once during compilation. This looks like an older version. Thomas was referring to this one in Isabelle/73dde8006820: fun dummy_task () = Task {group = new_group NONE, name = "", id = 0, pri = NONE, timing = new_timing ()}; Since the timing is a mutable variable here, it has to be created afresh for each use -- in Future.value construction. Normally 1 million extra allocations are not a big deal, but an experiment from yesterday shows that there is in fact a measurable impact. See now Isabelle/2afb928c71ca and the corresponding charts at http://www4.in.tum.de/~wenzelm/test/stats/at-poly.html I can only guess that allocation of mutable stuff costs extra. Allocation of a mutable, at least a fixed-size mutable such as ref, doesn't cost any more than allocating an immutable. However, if a mutable survives a GC it has an impact on subsequent GCs. The worst case would be a mutable that survived one GC and then becomes unreachable. It would continue to be scanned in every partial GC until it was thrown away by the next full GC. Does this correspond with what you've found? Regards, David ___ isabelle-dev mailing list isabelle-...@in.tum.de https://mailmanbroy.informatik.tu-muenchen.de/mailman/listinfo/isabelle-dev
Re: [isabelle-dev] Using PolyML's memory consumption profiling on Isabelle
On 14/10/2011 09:13, Andreas Schropp wrote: On 10/13/2011 01:24 PM, Thomas Sewell wrote: Good day all. Just wanted to let the Isabelle developers know about the latest feature David Matthews has added to PolyML, and to let you all know how useful it is. The feature allows profiling of objects after garbage collection. When code is compiled with PolyML.compiler.allocationProfiling set to 1, all objects allocated are also given a pointer to their allocating function. When the garbage collector runs with PolyML.compiler.profiling set to 4, a statistical trace is printed of which objects survived garbage collection. Cool. So we have: profiling = 1 approximates runtime Monte-Carlo style sampling of the program counter profiling = 2 records the number of words allocated in each function (very accurate IIRC) profiling = 3 ??? profiling = 4 counts GC survivors (very accurately?) Profiling 3 is the number of cases where the run-time system had to emulate an arithmetic operation because the operation required long-precision arithmetic. This is a LOT more expensive than doing the arithmetic with short precision ints so it may be worth recoding hot-spots that show up with this. Profiling 4 has just been added so it probably has teething-troubles. I would really prefer to replace these numbers by a datatype so that users don't have to remember numbers. Regards, David ___ isabelle-dev mailing list isabelle-...@in.tum.de https://mailmanbroy.informatik.tu-muenchen.de/mailman/listinfo/isabelle-dev
Re: [isabelle-dev] Using PolyML's memory consumption profiling on Isabelle
On Fri, 14 Oct 2011, Andreas Schropp wrote: On 10/13/2011 01:24 PM, Thomas Sewell wrote: There are surprisingly many dummy tasks. [...] 918632 Task_Queue.dummy_task(1) Such numbers always need to be put in relation. The original list was like that: 918632 Task_Queue.dummy_task(1) ... 13085440 Term.map_atyps(2) This means there are 2 orders of magnitude compared to the top entry. Even if such top entries are reduced significantly, the overall impact is very low on average. Addressing the lower entries is normally not worth the effort. val dummy_task = Task(NONE, ~1) Values are not shared?! What the hell? This looks like an older version. Thomas was referring to this one in Isabelle/73dde8006820: fun dummy_task () = Task {group = new_group NONE, name = "", id = 0, pri = NONE, timing = new_timing ()}; Since the timing is a mutable variable here, it has to be created afresh for each use -- in Future.value construction. Normally 1 million extra allocations are not a big deal, but an experiment from yesterday shows that there is in fact a measurable impact. See now Isabelle/2afb928c71ca and the corresponding charts at http://www4.in.tum.de/~wenzelm/test/stats/at-poly.html I can only guess that allocation of mutable stuff costs extra. Anyway, that is just a peophole optimization. The real improvements are usually coming from looking at the big picture. The very introduction of the dummy tasks for pre-evaluated future values was one such optimization. Another one the introduction of the timing field for tasks to improve the overall scheduling and throughput of the worker thread farm that crunches on these tasks. Makarius ___ isabelle-dev mailing list isabelle-...@in.tum.de https://mailmanbroy.informatik.tu-muenchen.de/mailman/listinfo/isabelle-dev
Re: [isabelle-dev] Using PolyML's memory consumption profiling on Isabelle
On 10/13/2011 01:24 PM, Thomas Sewell wrote: There are surprisingly many dummy tasks. [...] 918632 Task_Queue.dummy_task(1) val dummy_task = Task(NONE, ~1) Values are not shared?! What the hell? ___ isabelle-dev mailing list isabelle-...@in.tum.de https://mailmanbroy.informatik.tu-muenchen.de/mailman/listinfo/isabelle-dev
Re: [isabelle-dev] Using PolyML's memory consumption profiling on Isabelle
On 10/13/2011 01:24 PM, Thomas Sewell wrote: Good day all. Just wanted to let the Isabelle developers know about the latest feature David Matthews has added to PolyML, and to let you all know how useful it is. The feature allows profiling of objects after garbage collection. When code is compiled with PolyML.compiler.allocationProfiling set to 1, all objects allocated are also given a pointer to their allocating function. When the garbage collector runs with PolyML.compiler.profiling set to 4, a statistical trace is printed of which objects survived garbage collection. Cool. So we have: profiling = 1 approximates runtime Monte-Carlo style sampling of the program counter profiling = 2 records the number of words allocated in each function (very accurate IIRC) profiling = 3 ??? profiling = 4 counts GC survivors (very accurately?) This means that for Isabelle we get our first real look at what is taking up space at runtime and in images. I include the last 20 lines of traces produced at four interesting points 1) After including the HOL theories in HOL's ROOT.ML 2) After performing a PolyML.shareCommonData after (1) 3) After adding a 400-element record to a test theory built on the HOL image. 4) After performing a shareCommonData after (3) These are traces for profiling=4? Isabelle is generating a *lot* of copies of types& terms, particularly via Term.map_atyps. Since shareCommonData eliminates them, many are duplicates. It's possible that further use of the "same" variants from Term_Subst might help. It's also possible that the repeated reconstruction is necessary (e.g. repeated generalization/abstraction of type variables) and further use of the new Term_Sharing mechanism might be the answer. The way I learned Isabelle programming one views term-traversal as being cheap, which seems to be true most of the time especially when they are freshly allocated with nice locality properties. Sharing lots of subterms might interfere with this. Isn't this what GC was made for? Why introduce artificial sharing? BTW: the Coq kernel does huge amounts of sharing IIRC. Should we be concerned or is this just a thing because of proof terms? Makarius, please comment on this, because now I feel like a wasteful programmer. ;D A large quantity of the persistent objects are Table and Net objects, as expected. There are surprisingly many dummy tasks. What is a dummy task? A surprisingly large quantity of the persistent objects are associated with proof terms and name derivations. This is presumably not Thm.name_derivation but inlined code from its subcalls Proofterm.thm_proof, proof_combP, proof_combt' and Library.foldl, none of which are listed. If indeed these foldl loops are producing this many objects then perhaps the work done unconditionally here should be rethought. For proofs=0? Taking a guess these might be the PBody thms pointers. Cheers, Andy ___ isabelle-dev mailing list isabelle-...@in.tum.de https://mailmanbroy.informatik.tu-muenchen.de/mailman/listinfo/isabelle-dev
[isabelle-dev] Using PolyML's memory consumption profiling on Isabelle
Good day all. Just wanted to let the Isabelle developers know about the latest feature David Matthews has added to PolyML, and to let you all know how useful it is. The feature allows profiling of objects after garbage collection. When code is compiled with PolyML.compiler.allocationProfiling set to 1, all objects allocated are also given a pointer to their allocating function. When the garbage collector runs with PolyML.compiler.profiling set to 4, a statistical trace is printed of which objects survived garbage collection. This means that for Isabelle we get our first real look at what is taking up space at runtime and in images. I include the last 20 lines of traces produced at four interesting points 1) After including the HOL theories in HOL's ROOT.ML 2) After performing a PolyML.shareCommonData after (1) 3) After adding a 400-element record to a test theory built on the HOL image. 4) After performing a shareCommonData after (3) The output is currently a bit raw, since I'm not sure what the units are (bytes? words? objects?). I suspect objects, since the total is roughly a tenth of the runtime size. The other difficulty in interpreting this is the function call graph. Small functions inlined into others are counted as their parent. Library functions called from many places are counted against all of them simultaneously. Things for Isabelle developers to note from this: Isabelle is generating a *lot* of copies of types & terms, particularly via Term.map_atyps. Since shareCommonData eliminates them, many are duplicates. It's possible that further use of the "same" variants from Term_Subst might help. It's also possible that the repeated reconstruction is necessary (e.g. repeated generalization/abstraction of type variables) and further use of the new Term_Sharing mechanism might be the answer. A large quantity of the persistent objects are Table and Net objects, as expected. There are surprisingly many dummy tasks. A surprisingly large quantity of the persistent objects are associated with proof terms and name derivations. This is presumably not Thm.name_derivation but inlined code from its subcalls Proofterm.thm_proof, proof_combP, proof_combt' and Library.foldl, none of which are listed. If indeed these foldl loops are producing this many objects then perhaps the work done unconditionally here should be rethought. Traces: 1) After including the HOL theories in HOL's ROOT.ML 824925 Term_Subst.instT_same(3)subst_typs(1) 918632 Task_Queue.dummy_task(1) 942473 Net.insert(4)ins1(2) 1032196 Term.map_atyps(2) 1046107 Term_Subst.inst_same(4)subst(1) 1058304 Term_Sharing.init(2)typ(1) 1172790 List.map(2) 1194308 Term.map_atyps(2) 1356790 List.map(2) 1377084 Table().modify(3)modfy(1) 1875680 Term_Sharing.init(2)term(1) 2688712 Type.strip_sorts(1) 3082276 Table().modify(3)modfy(1) 3177593 Function code 5746085 Strings 5914517 Unidentified word data 8216169 Term.map_types(1)map_aux(1) 9206871 List.map(2) 9424264 Table().modify(3)modfy(1) 13085440 Term.map_atyps(2) 2) After performing a PolyML.shareCommonData after (1) 183492 Table().map_default(3)(1) 199062 List.map(2) 220528 Thm.transfer(2) 294270 Term_Subst.inst_same(4)subst(1) 350230 Thm.map_tags(1)(1) 461706 Ord_List.union(3)unio(2) 464109 Mutable data 478833 List.map(2) 562309 Term_Sharing.init(2)term(1) 581994 Proofterm.prepare_thm_proof(8) 674516 Thm.name_derivation(2) 700221 Net.insert(4)ins1(2) 918632 Task_Queue.dummy_task(1) 1019993 Term.map_types(1)map_aux(1) 1193305 Table().modify(3)modfy(1) 2619583 Table().modify(3)modfy(1) 2769346 Unidentified word data 3177590 Function code 4684257 Strings 8033629 Table().modify(3)modfy(1) 3) After adding a 400-element record to a test theory built on the HOL image. 211270 List.map(2) 549768 Table().modify(3)modfy(1) 638246 Logic.varifyT_global_same(1)(1) 844780 Envir.norm_term2(2)norm(1) 972069 ListPair.map(3) 1151428 Type.varify_global(3)thaw(1) 1222416 List.map(2) 1379140 Logic.varifyT_global_same(1)(1) 1727520 Term.map_atyps(2) 2397356 Type.strip_sorts(1) 2675166 Thm.name_derivation(2) 6035908 Term_Subst.map_atypsT_same(1)typ(1) 6558968 Net.insert(4)ins1(2) 7004076 List.map(2) 8166615 Same.map(2) 10044260 Logic.incr_indexes_same(2)incr(2) 10359792 Term.map_atyps(2) 14882581 Term.map_types(1)map_aux(1) 15712284 List.map(2) 23758208 Term.map_atyps(2) 4) After performing a shareCommonData after (3) 19086 Ord_List.union(3)unio(2) 20925 List.map(2) 25107 Table().modify(3)modfy(1) 30360 Thm.transfer(2) 33459 List.map(2) 49475 Term_Subst.inst_same(4)subst(1) 54711 Term_Subst.map_types_same(1)term(1) 55132 Table().modify(3)modfy(1) 58414 Mutable data 62876 Strings 64358 Thm.map_tags(1)(1) 72438 List.map(2) 87924 Proofterm.prepare_t