Hi Gergo, I looked in the detailed pane, searched for ModuleGraph, hovered my mouse over the "ModuleGraph` constructor, recorded the number of live bytes, divided that by 32.
Matt On Tue, Apr 1, 2025 at 7:04 AM Erdi, Gergo <gergo.e...@sc.com> wrote: > PUBLIC > > OK scratch that, I was looking at wrong ghc-debug output. Indeed there are > 2301 ModuleGraphs in the heap at the end of typechecking :O > > > > *From:* Erdi, Gergo > *Sent:* Tuesday, April 1, 2025 12:51 PM > *To:* Matthew Pickering <matthewtpicker...@gmail.com> > *Cc:* GHC Devs <ghc-devs@haskell.org>; ÉRDI Gergő <ge...@erdi.hu>; > Montelatici, Raphael Laurent <raphael.montelat...@sc.com>; Dijkstra, Atze > <atze.dijks...@sc.com> > *Subject:* Re: GHC memory usage when typechecking from source vs. loading > ModIfaces > > > > This sounds extremely interesting, but I don’t understand where you are > getting this number from! How do you see in the eventlog HTMLs that I’ve > included that there are ~2000 ModuleGraphs? I’ve now tried using ghc-debug > to find all ModuleGraph constructors at two points in the run: just before > typechecking the first module (after all the extendMG calls) and just after > typechecking the last module, and even in the cold case I only see 1 > ModuleGraph before and 13 ModuleGraphs after. > > > > Also, what do you mean by “precisely one loaded per interface loaded into > the EPS”? Since my repro has 2294 modules, wouldn’t that mean 2294 > ModuleGraphs by that metric? > > > > *From:* Matthew Pickering <matthewtpicker...@gmail.com> > *Sent:* Saturday, March 29, 2025 1:53 AM > *To:* Erdi, Gergo <gergo.e...@sc.com> > *Cc:* GHC Devs <ghc-devs@haskell.org>; ÉRDI Gergő <ge...@erdi.hu>; > Montelatici, Raphael Laurent <raphael.montelat...@sc.com>; Dijkstra, Atze > <atze.dijks...@sc.com> > *Subject:* [External] Re: GHC memory usage when typechecking from source > vs. loading ModIfaces > > > > > > Hi Gergo, > > > > I quickly tried building `Cabal` with the master branch. There is > precisely 1 ModuleGraph allocated for the home session, and precisely one > loaded per interface loaded into the EPS. No leaky behaviour like you can > see in your eventlogs. > > > > It seems there are about 2000 live module graphs in your program, are you > doing something with the API to create this many? > > > > Cheers, > > > > Matt > > > > On Fri, Mar 28, 2025 at 12:40 PM Matthew Pickering < > matthewtpicker...@gmail.com> wrote: > > HI Gergo, > > > > Do you have a (synthetic?) reproducer? You have probably identified some > memory leak. However, without any means to reproduce it becomes very > difficult to investigate. I feel like we are getting into very precise > details now, where speculating is not going to be so useful. > > > > It seems like this is an important thing for you and your company. Is > there any budget to pay for some investigation? If that was the case then > some effort could be made to create a synthetic producer and make the > situation more robust going into the future if your requirements were > precisely understood. > > > > Cheers, > > > > Matt > > > > On Fri, Mar 28, 2025 at 10:12 AM Erdi, Gergo <gergo.e...@sc.com> wrote: > > PUBLIC > > Just to add that I get the same "equalizing" behaviour (but in a more > "natural" way) if instead of deepseq-ing the ModuleGraph upfront, I just > call `hugInstancesBelow` before processing each module. So that's > definitely one source of extra memory usage. I wonder if it would be > possible to rebuild the ModuleGraph periodically (similar to the ModDetails > dehydration), or if there are references to it stored all over the place > from `HscEnv`s scattered around in closures etc. (basically the same > problem the HPT had before it was made into a mutable reference). > > -----Original Message----- > From: ghc-devs <ghc-devs-boun...@haskell.org> On Behalf Of Erdi, Gergo > via ghc-devs > Sent: Friday, March 28, 2025 4:49 PM > To: Matthew Pickering <matthewtpicker...@gmail.com>; GHC Devs < > ghc-devs@haskell.org> > Cc: ÉRDI Gergő <ge...@erdi.hu>; Montelatici, Raphael Laurent < > raphael.montelat...@sc.com>; Dijkstra, Atze <atze.dijks...@sc.com> > Subject: [External] Re: GHC memory usage when typechecking from source vs. > loading ModIfaces > > Hi, > > Unfortunately, I am forced to return to this problem. Everything below is > now in the context of GHC 9.12 plus the mutable HPT patch backported. > > My test case is typechecking a tree of 2294 modules that form the > transitive closure of a single module's dependencies, all in a single > process. I have done this typechecking three times, here's what `+RTS -s > -RTS` reports for max residency: > > * "cold": With no on-disk `ModIface` files, i.e. from scratch: 537 MB > > * "cold-top": With all `ModIface`s already on disk, except for the > single top-level module: 302 MB > > * "warm": With all `ModIface`s already on disk: 211 MB > > So my stupidly naive question is, why is the "cold" case also not 302 MB? > > In earlier discussion, `ModDetails` unfolding has come up. Dehydrating > `ModDetails` in the HPT all the time is disastrous for runtime, but based > on this model I would expect to see improvements from dehydrating "every > now and then". So I tried a stupid simple example where after every 100th > typechecked module, I run this function on the topologically sorted list of > modules processed so far: > > > ``` > dehydrateHpt :: HscEnv -> [ModuleName] -> IO () dehydrateHpt hsc_env mods > = do > let HPT{ table = hptr } = hsc_HPT hsc_env > hpt <- readIORef hptr > for_ mods \mod -> for_ (lookupUDFM hpt mod) \(HomeModInfo iface > _details _linkable) -> do > !details <- initModDetails hsc_env iface > pure () > ``` > > Buuut the max residency is still 534 MB (see "cold-dehydrate"); in fact, > the profile looks exactly the same. > > Speaking of the profile, in the "cold" case I see a lot of steadily > increasing heap usage from the `ModuleGraph`. I could see this happening if > typechecking from scratch involves more `modulesUnder` calls which in turn > force more and more of the `ModuleGraph`. If so, then maybe this could be > worked around by repeatedly remaking the `ModuleGraph` just like I remake > the `ModDetails` above. I tried getting rid of this effect by `deepseq`'ing > the `ModuleGraph` at the start, with the idea being that this should > "equalize" the three scenarios if this really is a substantial source of > extra memory usage. This pushes up the warm case's memory usage to 381 MB, > which is promising, but I still see a `Word64Map` that is steadily > increasing in the "cold-force-modulegraph" case and contributes a lot to > the memory usage. Unfortunately, I don't know where that `Word64Map` is (it > could be any `Unique`-keyed environment...). > > So I am now stuck at this point. To spell out my goal explicitly, I would > like to typecheck one module after another and not keep anything more in > memory around than if I loaded them from `ModIface` files. > > Thanks, > Gergo > > p.s.: I couldn't find a way in the EventLog output HTML to turn event > markers on/off or filter them, so to avoid covering the whole graph with > gray lines, I mark only every 100th module. > > > > > From: Matthew Pickering <matthewtpicker...@gmail.com> > Sent: Wednesday, February 12, 2025 7:08 PM > To: ÉRDI Gergő <ge...@erdi.hu> > Cc: Erdi, Gergo <gergo.e...@sc.com>; Zubin Duggal <zu...@well-typed.com>; > Montelatici, Raphael Laurent <raphael.montelat...@sc.com>; GHC Devs < > ghc-devs@haskell.org> > Subject: [External] Re: GHC memory usage when typechecking from source vs. > loading ModIfaces > > You do also raise a good point about rehydration costs. > > In oneshot mode, you are basically rehydrating the entire transitive > closure of each module when you compile it, which obviously results in a > large amount of repeated work. This is why people are investigating ideas > of a persistent worker to at least avoid rehydrating all external > dependencies as well. > > On Mon, Feb 10, 2025 at 12:13 PM Matthew Pickering <mailto: > matthewtpicker...@gmail.com> wrote: > Sure, you can remove them once you are sure they are not used anymore. > > For clients like `GHCi` that doesn't work obviously as they can be used at > any point in the future but for a batch compiler it would be fine. > > On Mon, Feb 10, 2025 at 11:56 AM ÉRDI Gergő <mailto:ge...@erdi.hu> wrote: > On Mon, 10 Feb 2025, Matthew Pickering wrote: > > > I wonder if you have got your condition the wrong way around. > > > > The only "safe" time to perform rehydration is AFTER the point it can > > never be used again. > > > > If you rehydrate it just before it is used then you will repeat work > > which has already been done. If you do this, you will always have a > > trade-off between space used and runtime. > > Oops. Yes, I have misunderstood the idea. I thought the idea was that > after loading a given module into the HPT, its ModDetails would start out > small (because of laziness) and then keep growing in size as more and more > of it are traversed, and thus forced, during the typechecking of its > dependees, so at some point we would want to reset that into the small > initial representation as created by initModDetails. > > But if the idea is that I should rehydrate modules when they can't be used > anymore, then that brings up the question why even do that, instead of > straight removing the HomeModInfos from the HPT? > > ---------------------------------------------------------------------- > > ------------------------------ > This email and any attachments are confidential and may also be > privileged. If you are not the intended recipient, please delete all copies > and notify the sender immediately. You may wish to refer to the > incorporation details of Standard Chartered PLC, Standard Chartered Bank > and their subsidiaries together with Standard Chartered Bank’s Privacy > Policy via our public website. > ------------------------------ > This email and any attachments are confidential and may also be > privileged. If you are not the intended recipient, please delete all copies > and notify the sender immediately. You may wish to refer to the > incorporation details of Standard Chartered PLC, Standard Chartered Bank > and their subsidiaries together with Standard Chartered Bank’s Privacy > Policy via our main Standard Chartered PLC (UK) website at sc. com >
_______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs