On Thu, Dec 22, 2022 at 7:24 PM John Naylor <john.nay...@enterprisedb.com> wrote: > > > On Wed, Dec 21, 2022 at 3:09 PM Masahiko Sawada <sawada.m...@gmail.com> wrote: > > > > On Tue, Dec 20, 2022 at 3:09 PM John Naylor > > <john.nay...@enterprisedb.com> wrote: > > > > https://www.postgresql.org/message-id/20220704211822.kfxtzpcdmslzm2dy%40awork3.anarazel.de > > > > > > I'm guessing the hash join case can afford to be precise about memory > > > because it must spill to disk when exceeding workmem. We don't have that > > > design constraint. > > > > You mean that the memory used by the radix tree should be limited not > > by the amount of memory actually used, but by the amount of memory > > allocated? In other words, it checks by MomoryContextMemAllocated() in > > the local cases and by dsa_get_total_size() in the shared case. > > I mean, if this patch set uses 10x less memory than v15 (not always, but easy > to find cases where it does), and if it's also expensive to track memory use > precisely, then we don't have an incentive to track memory precisely. Even if > we did, we don't want to assume that every future caller of radix tree is > willing to incur that cost.
Understood. > > > The idea of using up to half of maintenance_work_mem might be a good > > idea compared to the current flat-array solution. But since it only > > uses half, I'm concerned that there will be users who double their > > maintenace_work_mem. When it is improved, the user needs to restore > > maintenance_work_mem again. > > I find it useful to step back and look at the usage patterns: > > Autovacuum: Limiting the memory allocated by vacuum is important, since there > are multiple workers and they can run at any time (possibly most of the > time). This case will not use parallel index vacuum, so will use slab, where > the quick estimation of memory taken by the context is not terribly far off, > so we can afford to be more optimistic here. > > Manual vacuum: The default configuration assumes we want to finish as soon as > possible (vacuum_cost_delay is zero). Parallel index vacuum can be used. My > experience leads me to believe users are willing to use a lot of memory to > make manual vacuum finish as quickly as possible, and are disappointed to > learn that even if maintenance work mem is 10GB, vacuum can only use 1GB. Agreed. > So I don't believe anyone will have to double maintenance work mem after > upgrading (even with pessimistic accounting) because we'll be both > - much more efficient with memory on average > - free from the 1GB cap Make sense. > > That said, it's possible 50% is too pessimistic -- a 75% threshold will bring > us very close to powers of two for example: > > 2*(1+2+4+8+16+32+64+128) + 256 = 766MB (74.8% of 1GB) -> keep going > 766 + 256 = 1022MB -> stop > > I'm not sure if that calculation could cause going over the limit, or how > common that would be. > If the value is a power of 2, it seems to work perfectly fine. But for example if it's 700MB, the total memory exceeds the limit: 2*(1+2+4+8+16+32+64+128) = 510MB (72.8% of 700MB) -> keep going 510 + 256 = 766MB -> stop but it exceeds the limit. In a more bigger case, if it's 11000MB, 2*(1+2+...+2048) = 8190MB (74.4%) 8190 + 4096 = 12286MB That being said, I don't think they are not common cases. So the 75% threshold seems to work fine in most cases. Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com