Re: [HACKERS] DBT-3 with SF=20 got failed

Tomas Vondra Thu, 11 Jun 2015 16:30:06 -0700

Hi,

On 06/11/15 16:28, Robert Haas wrote:

On Wed, Jun 10, 2015 at 10:57 PM, Kouhei Kaigai <kai...@ak.jp.nec.com> wrote:

The attached patch replaces this palloc0() by MemoryContextAllocHuge() + 
memset().
Indeed, this hash table is constructed towards the relation with 
nrows=119994544,
so, it is not strange even if hash-slot itself is larger than 1GB.


You forgot to attach the patch, I think.  It looks to me like the size
of a HashJoinTuple is going to be 16 bytes, so 1GB/16 = ~64 million.
That's a lot of buckets, but maybe not unreasonably many if you've got
enough memory.

Actually, HashJoinTuple is just a pointer, so it's 8 bytes, so 1GB isenough for 134217728 million rows, which is more than the 119994544 rowsfrom the plan.


Also, looking at the error message again:

    ERROR:  invalid memory alloc request size 1073741824

but this time with beer goggles, I noticed that the amount reported isexactly 1GB. The backtrace also shows the error happens right insideExecHashTableCreate (and not in the resize which may happen later),which means it gets the nbuckets from ExecChooseHashTableSize directly.

The resize is probably still broken as I mentioned before, but thiscrash before reaching that code as the estimates are high enough totrigger the issue. But ExecChooseHashTableSize is supposed to keep allthe checks from previous versions, and I think it actually does.

But I don't see there any checks regarding the 1GB boundary. What I seeis this:


  max_pointers = (work_mem * 1024L) / sizeof(void *);
  max_pointers = Min(max_pointers, INT_MAX / 2);

  ...

  dbuckets = Min(dbuckets, max_pointers);

That has nothing to do with 1GB, and it's in the code since the timework_mem was limited by 2GB, so perhaps there was some reasoning thatit's sufficient (because the tuples stored in the hash table will needmore than 1/2 of the memory, or something like that).

But today this issue is more likely, because people have more RAM anduse higher work_mem values, so the max_pointers value gets much higher.In the extreme it may get to INT_MAX/2, so ~1 billion, so the bucketswould allocate ~8B on 64-bit machines (on 32-bit machines we'd also gettwice the number of pointers, compared to 64 bits, but that's mostlyirrelevant, because of the memory size limits).

It's also true, that the hash-join improvements in 9.5 - namely thedecrease of NTUP_PER_BUCKET from 10 to 1, made this error more likely.With 9.4 we'd use only 16777216 buckets (128MB), because that gets usbelow 10 tuples per bucket. But now we're shooting for 1 tuple perbucket, so we end up with 131M buckets, and that's 1GB.


I see two ways to fix this:

(1) enforce the 1GB limit (probably better for back-patching, if that's
    necessary)

(2) make it work with hash tables over 1GB

I'm in favor of (2) if there's a good way to do that. It seems a bitstupid not to be able to use fast hash table because there's someartificial limit. Are there any fundamental reasons not to use theMemoryContextAllocHuge fix, proposed by KaiGai-san?



--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] DBT-3 with SF=20 got failed

Reply via email to