Hi Tomas, Here is the patch, it's could be compatible with your patch and it focus on when to regrow the batch.
On Tue, May 28, 2019 at 3:40 PM Hubert Zhang <hzh...@pivotal.io> wrote: > On Sat, May 4, 2019 at 8:34 AM Tomas Vondra <tomas.von...@2ndquadrant.com> > wrote: > >> The root cause is that hash join treats batches as pretty much free, but >> that's not really true - we do allocate two BufFile structs per batch, >> and each BufFile is ~8kB as it includes PGAlignedBuffer. >> >> The OOM is not very surprising, because with 524288 batches it'd need >> about 8GB of memory, and the system only has 8GB RAM installed. >> >> The second patch tries to enforce work_mem more strictly. That would be >> impossible if we were to keep all the BufFile structs in memory, so >> instead it slices the batches into chunks that fit into work_mem, and >> then uses a single "overflow" file for slices currently not in memory. >> These extra slices can't be counted into work_mem, but we should need >> just very few of them. For example with work_mem=4MB the slice is 128 >> batches, so we need 128x less overflow files (compared to per-batch). >> >> > Hi Tomas > > I read your second patch which uses overflow buf files to reduce the total > number of batches. > It would solve the hash join OOM problem what you discussed above: 8K per > batch leads to batch bloating problem. > > I mentioned in another thread: > > https://www.postgresql.org/message-id/flat/CAB0yrekv%3D6_T_eUe2kOEvWUMwufcvfd15SFmCABtYFOkxCFdfA%40mail.gmail.com > There is another hashjoin OOM problem which disables splitting batches too > early. PG uses a flag hashtable->growEnable to determine whether to split > batches. Once one splitting failed(all the tuples are assigned to only one > batch of two split ones) The growEnable flag would be turned off forever. > > The is an opposite side of batch bloating problem. It only contains too > few batches and makes the in-memory hash table too large to fit into memory. > > Here is the tradeoff: one batch takes more than 8KB(8KB makes sense, due > to performance), in-memory hash table takes memory as well and splitting > batched may(not must) reduce the in-memory hash table size but introduce > more batches(and thus more memory usage 8KB*#batch). > Can we conclude that it would be worth to splitting if satisfy: > (The reduced memory of in-memory hash table) - (8KB * number of new > batches) > 0 > > So I'm considering to combine our patch with your patch to fix join OOM > problem. No matter the OOM is introduced by (the memory usage of > in-memory hash table) or (8KB * number of batches). > > nbatch_inmemory in your patch could also use the upper rule to redefine. > > What's your opinion? > > Thanks > > Hubert Zhang > -- Thanks Hubert Zhang
0001-Allow-to-continue-to-split-batch-when-tuples-become-.patch
Description: Binary data