Re: [PoC] Improve dead tuple storage for lazy vacuum

Masahiko Sawada Thu, 22 Dec 2022 07:00:27 -0800

On Thu, Dec 22, 2022 at 7:24 PM John Naylor
<john.nay...@enterprisedb.com> wrote:
>
>
> On Wed, Dec 21, 2022 at 3:09 PM Masahiko Sawada <sawada.m...@gmail.com> wrote:
> >
> > On Tue, Dec 20, 2022 at 3:09 PM John Naylor
> > <john.nay...@enterprisedb.com> wrote:
>
> > > https://www.postgresql.org/message-id/20220704211822.kfxtzpcdmslzm2dy%40awork3.anarazel.de
> > >
> > > I'm guessing the hash join case can afford to be precise about memory 
> > > because it must spill to disk when exceeding workmem. We don't have that 
> > > design constraint.
> >
> > You mean that the memory used by the radix tree should be limited not
> > by the amount of memory actually used, but by the amount of memory
> > allocated? In other words, it checks by MomoryContextMemAllocated() in
> > the local cases and by dsa_get_total_size() in the shared case.
>
> I mean, if this patch set uses 10x less memory than v15 (not always, but easy 
> to find cases where it does), and if it's also expensive to track memory use 
> precisely, then we don't have an incentive to track memory precisely. Even if 
> we did, we don't want to assume that every future caller of radix tree is 
> willing to incur that cost.


Understood.

>
> > The idea of using up to half of maintenance_work_mem might be a good
> > idea compared to the current flat-array solution. But since it only
> > uses half, I'm concerned that there will be users who double their
> > maintenace_work_mem. When it is improved, the user needs to restore
> > maintenance_work_mem again.
>
> I find it useful to step back and look at the usage patterns:
>
> Autovacuum: Limiting the memory allocated by vacuum is important, since there 
> are multiple workers and they can run at any time (possibly most of the 
> time). This case will not use parallel index vacuum, so will use slab, where 
> the quick estimation of memory taken by the context is not terribly far off, 
> so we can afford to be more optimistic here.
>
> Manual vacuum: The default configuration assumes we want to finish as soon as 
> possible (vacuum_cost_delay is zero). Parallel index vacuum can be used. My 
> experience leads me to believe users are willing to use a lot of memory to 
> make manual vacuum finish as quickly as possible, and are disappointed to 
> learn that even if maintenance work mem is 10GB, vacuum can only use 1GB.

Agreed.

> So I don't believe anyone will have to double maintenance work mem after 
> upgrading (even with pessimistic accounting) because we'll be both
> - much more efficient with memory on average
> - free from the 1GB cap

Make sense.

>
> That said, it's possible 50% is too pessimistic -- a 75% threshold will bring 
> us very close to powers of two for example:
>
> 2*(1+2+4+8+16+32+64+128) + 256 = 766MB (74.8% of 1GB) -> keep going
> 766 + 256 = 1022MB -> stop
>
> I'm not sure if that calculation could cause going over the limit, or how 
> common that would be.
>

If the value is a power of 2, it seems to work perfectly fine. But for
example if it's 700MB, the total memory exceeds the limit:

2*(1+2+4+8+16+32+64+128) = 510MB (72.8% of 700MB) -> keep going
510 + 256 = 766MB -> stop but it exceeds the limit.

In a more bigger case, if it's 11000MB,

2*(1+2+...+2048) = 8190MB (74.4%)
8190 + 4096 = 12286MB

That being said, I don't think they are not common cases. So the 75%
threshold seems to work fine in most cases.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

Re: [PoC] Improve dead tuple storage for lazy vacuum

Reply via email to