On Thu, Aug 31, 2023 at 9:19 AM John Morris <john.mor...@crunchydata.com> wrote:
> Here is an updated version of the earlier work. > > This version: > 1) Tracks memory as requested by the backend. > 2) Includes allocations made during program startup. > 3) Optimizes the "fast path" to only update two local variables. > 4) Places a cluster wide limit on total memory allocated. > > The cluster wide limit is useful for multi-hosting. One greedy cluster > doesn't starve > the other clusters of memory. > > Note there isn't a good way to track actual memory used by a cluster. > Ideally, we like to get the working set size of each memory segment along > with > the size of the associated kernel data structures. > Gathering that info in a portable way is a "can of worms". > Instead, we're managing memory as requested by the application. > While not identical, the two approaches are strongly correlated. > > The memory model used is > 1) Each process is assumed to use a certain amount of memory > simply by existing. > 2) All pg memory allocations are counted, including those before > the process is fully initialized. > 3) Each process maintains its own local counters. These are the > "truth". > 4) Periodically, > - local counters are added into the global, shared memory > counters. > - pgstats is updated > - total memory is checked. > > For efficiency, the global total is an approximation, not a precise number. > It can be off by as much as 1 MB per process. Memory limiting > doesn't need precision, just a consistent and reasonable approximation. > > Repeating the earlier benchmark test, there is no measurable loss of > performance. > > Hi, In `InitProcGlobal`: + elog(WARNING, "proc init: max_total=%d result=%d\n", max_total_bkend_mem, result); Is the above log for debugging purposes ? Maybe the log level should be changed. + errmsg("max_total_backend_memory %dMB - shared_memory_size %dMB is <= 100MB", The `<=` in the error message is inconsistent with the `max_total_bkend_mem < result + 100` check. Please modify one of them. For update_global_allocation : + Assert((int64)pg_atomic_read_u64(&ProcGlobal->total_bkend_mem_bytes) >= 0); + Assert((int64)pg_atomic_read_u64(&ProcGlobal->global_dsm_allocation) >= 0); Should the assertions be done earlier in the function? For reserve_backend_memory: + return true; + + /* CASE: the new allocation is within bounds. Take the fast path. */ + else if (my_memory.allocated_bytes + size <= allocation_upper_bound) The `else` can be omitted (the preceding if block returns). For `atomic_add_within_bounds_i64` + newval = oldval + add; + + /* check if we are out of bounds */ + if (newval < lower_bound || newval > upper_bound || addition_overflow(oldval, add)) Since the summation is stored in `newval`, you can pass newval to `addition_overflow` so that `addition_overflow` doesn't add them again. There are debug statements, such as: + debug("done\n"); you can drop them in the next patch. Thanks