What command do I run to generate the files from this patch file? Perhaps a link to a git repo would be a suitable way to share the reproducer?
On Wed, Apr 2, 2025 at 10:26 AM Erdi, Gergo <gergo.e...@sc.com> wrote: > PUBLIC > > Hi Matt, > > > > I think I have something that might demonstrate that GHC (at least GHC > 9.12.1) might have a similar problem! > > > > With the attached vacuous module hierarchy, I tried compiling M2294 from > scratch, and then with `.hi` files for everything except the toplevel > module. I did the same with our GHC-API-using compiler as well. As you can > see from the attached event logs, while the details differ, the overall > shape of the memory used by ModuleGraph edges (750k of GWIB and > NodeKey_Module constructors for the 2321 ModuleNodes and ~60k direct > dependency edges) is pretty much the same between our compiler and GHC > 9.12, suggesting to me that GHC is duplicating ModuleGraph node information > in the dependency edges when building the transitive closure. > > > > Based on these measurements, do you agree that this is a GHC-side problem > of memory usage scaling quadratically with the number of dependency edges? > > > > Thanks, > > Gergo > > > > p.s.: Sorry for including the reproducer module tree in this weird format > as a patch file, but I am behind a mail server that won’t let me send mails > with too many individual files in attached archives… > > > > *From:* Matthew Pickering <matthewtpicker...@gmail.com> > *Sent:* Friday, March 28, 2025 8:40 PM > *To:* Erdi, Gergo <gergo.e...@sc.com> > *Cc:* GHC Devs <ghc-devs@haskell.org>; ÉRDI Gergő <ge...@erdi.hu>; > Montelatici, Raphael Laurent <raphael.montelat...@sc.com>; Dijkstra, Atze > <atze.dijks...@sc.com> > *Subject:* [External] Re: GHC memory usage when typechecking from source > vs. loading ModIfaces > > > > HI Gergo, > > > > Do you have a (synthetic?) reproducer? You have probably identified some > memory leak. However, without any means to reproduce it becomes very > difficult to investigate. I feel like we are getting into very precise > details now, where speculating is not going to be so useful. > > > > It seems like this is an important thing for you and your company. Is > there any budget to pay for some investigation? If that was the case then > some effort could be made to create a synthetic producer and make the > situation more robust going into the future if your requirements were > precisely understood. > > > > Cheers, > > > > Matt > > > > On Fri, Mar 28, 2025 at 10:12 AM Erdi, Gergo <gergo.e...@sc.com> wrote: > > PUBLIC > > Just to add that I get the same "equalizing" behaviour (but in a more > "natural" way) if instead of deepseq-ing the ModuleGraph upfront, I just > call `hugInstancesBelow` before processing each module. So that's > definitely one source of extra memory usage. I wonder if it would be > possible to rebuild the ModuleGraph periodically (similar to the ModDetails > dehydration), or if there are references to it stored all over the place > from `HscEnv`s scattered around in closures etc. (basically the same > problem the HPT had before it was made into a mutable reference). > > -----Original Message----- > From: ghc-devs <ghc-devs-boun...@haskell.org> On Behalf Of Erdi, Gergo > via ghc-devs > Sent: Friday, March 28, 2025 4:49 PM > To: Matthew Pickering <matthewtpicker...@gmail.com>; GHC Devs < > ghc-devs@haskell.org> > Cc: ÉRDI Gergő <ge...@erdi.hu>; Montelatici, Raphael Laurent < > raphael.montelat...@sc.com>; Dijkstra, Atze <atze.dijks...@sc.com> > Subject: [External] Re: GHC memory usage when typechecking from source vs. > loading ModIfaces > > Hi, > > Unfortunately, I am forced to return to this problem. Everything below is > now in the context of GHC 9.12 plus the mutable HPT patch backported. > > My test case is typechecking a tree of 2294 modules that form the > transitive closure of a single module's dependencies, all in a single > process. I have done this typechecking three times, here's what `+RTS -s > -RTS` reports for max residency: > > * "cold": With no on-disk `ModIface` files, i.e. from scratch: 537 MB > > * "cold-top": With all `ModIface`s already on disk, except for the > single top-level module: 302 MB > > * "warm": With all `ModIface`s already on disk: 211 MB > > So my stupidly naive question is, why is the "cold" case also not 302 MB? > > In earlier discussion, `ModDetails` unfolding has come up. Dehydrating > `ModDetails` in the HPT all the time is disastrous for runtime, but based > on this model I would expect to see improvements from dehydrating "every > now and then". So I tried a stupid simple example where after every 100th > typechecked module, I run this function on the topologically sorted list of > modules processed so far: > > > ``` > dehydrateHpt :: HscEnv -> [ModuleName] -> IO () dehydrateHpt hsc_env mods > = do > let HPT{ table = hptr } = hsc_HPT hsc_env > hpt <- readIORef hptr > for_ mods \mod -> for_ (lookupUDFM hpt mod) \(HomeModInfo iface > _details _linkable) -> do > !details <- initModDetails hsc_env iface > pure () > ``` > > Buuut the max residency is still 534 MB (see "cold-dehydrate"); in fact, > the profile looks exactly the same. > > Speaking of the profile, in the "cold" case I see a lot of steadily > increasing heap usage from the `ModuleGraph`. I could see this happening if > typechecking from scratch involves more `modulesUnder` calls which in turn > force more and more of the `ModuleGraph`. If so, then maybe this could be > worked around by repeatedly remaking the `ModuleGraph` just like I remake > the `ModDetails` above. I tried getting rid of this effect by `deepseq`'ing > the `ModuleGraph` at the start, with the idea being that this should > "equalize" the three scenarios if this really is a substantial source of > extra memory usage. This pushes up the warm case's memory usage to 381 MB, > which is promising, but I still see a `Word64Map` that is steadily > increasing in the "cold-force-modulegraph" case and contributes a lot to > the memory usage. Unfortunately, I don't know where that `Word64Map` is (it > could be any `Unique`-keyed environment...). > > So I am now stuck at this point. To spell out my goal explicitly, I would > like to typecheck one module after another and not keep anything more in > memory around than if I loaded them from `ModIface` files. > > Thanks, > Gergo > > p.s.: I couldn't find a way in the EventLog output HTML to turn event > markers on/off or filter them, so to avoid covering the whole graph with > gray lines, I mark only every 100th module. > > > > > From: Matthew Pickering <matthewtpicker...@gmail.com> > Sent: Wednesday, February 12, 2025 7:08 PM > To: ÉRDI Gergő <ge...@erdi.hu> > Cc: Erdi, Gergo <gergo.e...@sc.com>; Zubin Duggal <zu...@well-typed.com>; > Montelatici, Raphael Laurent <raphael.montelat...@sc.com>; GHC Devs < > ghc-devs@haskell.org> > Subject: [External] Re: GHC memory usage when typechecking from source vs. > loading ModIfaces > > You do also raise a good point about rehydration costs. > > In oneshot mode, you are basically rehydrating the entire transitive > closure of each module when you compile it, which obviously results in a > large amount of repeated work. This is why people are investigating ideas > of a persistent worker to at least avoid rehydrating all external > dependencies as well. > > On Mon, Feb 10, 2025 at 12:13 PM Matthew Pickering <mailto: > matthewtpicker...@gmail.com> wrote: > Sure, you can remove them once you are sure they are not used anymore. > > For clients like `GHCi` that doesn't work obviously as they can be used at > any point in the future but for a batch compiler it would be fine. > > On Mon, Feb 10, 2025 at 11:56 AM ÉRDI Gergő <mailto:ge...@erdi.hu> wrote: > On Mon, 10 Feb 2025, Matthew Pickering wrote: > > > I wonder if you have got your condition the wrong way around. > > > > The only "safe" time to perform rehydration is AFTER the point it can > > never be used again. > > > > If you rehydrate it just before it is used then you will repeat work > > which has already been done. If you do this, you will always have a > > trade-off between space used and runtime. > > Oops. Yes, I have misunderstood the idea. I thought the idea was that > after loading a given module into the HPT, its ModDetails would start out > small (because of laziness) and then keep growing in size as more and more > of it are traversed, and thus forced, during the typechecking of its > dependees, so at some point we would want to reset that into the small > initial representation as created by initModDetails. > > But if the idea is that I should rehydrate modules when they can't be used > anymore, then that brings up the question why even do that, instead of > straight removing the HomeModInfos from the HPT? > > ---------------------------------------------------------------------- > This email and any attachments are confidential and may also be > privileged. If you are not the intended recipient, please delete all copies > and notify the sender immediately. You may wish to refer to the > incorporation details of Standard Chartered PLC, Standard Chartered Bank > and their subsidiaries together with Standard Chartered Bank’s Privacy > Policy via our public website. > > ---------------------------------------------------------------------- > This email and any attachments are confidential and may also be > privileged. If you are not the intended recipient, please delete all copies > and notify the sender immediately. You may wish to refer to the > incorporation details of Standard Chartered PLC, Standard Chartered Bank > and their subsidiaries together with Standard Chartered Bank’s Privacy > Policy via our main Standard Chartered PLC (UK) website at sc. com > > ---------------------------------------------------------------------- > This email and any attachments are confidential and may also be > privileged. If you are not the intended recipient, please delete all copies > and notify the sender immediately. You may wish to refer to the > incorporation details of Standard Chartered PLC, Standard Chartered Bank > and their subsidiaries together with Standard Chartered Bank’s Privacy > Policy via our main Standard Chartered PLC (UK) website at sc. com > > ------------------------------ > This email and any attachments are confidential and may also be > privileged. If you are not the intended recipient, please delete all copies > and notify the sender immediately. You may wish to refer to the > incorporation details of Standard Chartered PLC, Standard Chartered Bank > and their subsidiaries together with Standard Chartered Bank’s Privacy > Policy via our main Standard Chartered PLC (UK) website at sc. com >
_______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs