On 12/26/23 11:49, Anton A. Melnikov wrote: > Hello! > > Earlier in this thread, the pgbench results were published, where with a > strong memory limit of 100MB > a significant, about 10%, decrease in TPS was observed [1]. > > Using dedicated server with 12GB RAM and methodology described in [3], i > performed five series > of measurements for the patches from the [2].
Can you share some info about the hardware? For example the CPU model, number of cores, and so on. 12GB RAM is not quite huge, so presumably it was a small machine. > The series were like this: > 1) unpatched 16th version at the REL_16_BETA1 (e0b82fc8e83) as close to > [2] in time. > 2) patched REL_16_BETA1 at e0b82fc8e83 with undefined > max_total_backend_memory GUC (with default value = 0). > 3) patched REL_16_BETA1 with max_total_backend_memory = 16GB > 4) the same with max_total_backend_memory = 8GB > 5) and again with max_total_backend_memory = 200MB > OK > Measurements with max_total_backend_memory = 100MB were not be carried out, > with limit 100MB the server gave an error on startup: > FATAL: configured max_total_backend_memory 100MB is <= > shared_memory_size 143MB > So i used 200MB to retain all other GUCs the same. > I'm not very familiar with the patch yet, but this seems a bit strange. Why should shared_buffers be included this limit? > Pgbench gave the following results: > 1) and 2) almost the same: ~6350 TPS. See orange and green > distributions on the attached graph.png respectively. > 3) and 4) identical to each other (~6315 TPS) and a bit slower than 1) > and 2) by ~0,6%. > See blue and yellow distributions respectively. > 5) is slightly slower (~6285 TPS) than 3) and 4) by another 0,5%. (grey > distribution) > The standard error in all series was ~0.2%. There is a raw data in the > raw_data.txt. > I think 6350 is a pretty terrible number, especially for scale 8, which is maybe 150MB of data. I think that's a pretty clear sign the system was hitting some other bottleneck, which can easily mask regressions in the memory allocation code. AFAICS the pgbench runs were regular r/w benchmarks, so I'd bet it was hitting I/O, and possibly even subject to some random effects at that level. I think what would be interesting are runs with pgbench -M prepared -S -c $N -j $N i.e. read-only tests (to not hit I/O), and $N being sufficiently large to maybe also show some concurrency/locking bottlenecks, etc. I may do some benchmarks if I happen to find a bit of time, but maybe you could collect such numbers too? The other benchmark that might be interesting is more OLAP, with low concurrency but backends allocating a lot of memory. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company