Robert Haas wrote > On Mon, Jun 9, 2014 at 4:06 AM, b8flowerfire <
> b8flowerfire@ > > wrote: > > This has come up before. Basically, the problem is that if you reduce > NTUP_PER_BUCKET, the bucket array gets larger, which might reduce the > amount of space available for tuples to the point where the hash join > overflows to multiple batches. That will be more expensive than > performing the hash join with a higher NTUP_PER_BUCKET. > > -- > Robert Haas > EnterpriseDB: http://www.enterprisedb.com > The Enterprise PostgreSQL Company Thanks for the explanation. But i don't think it is very convincible. Simply reduce the value of NTUP_PER_BUCKET will enlarge the pointer array and reduce the tuples in one batch. But is that effect significant to the performance? The utilization of the work_mem, i think, is determined by the ratio of size of the pointer and the size of the tuple. Let's assume the size of tuple is 28 bytes, which is very reasonable because it's the sum of the size of HJTUPLE_OVERHEAD(at least 8 bytes), the size of MinimalTupleData(at least 10 bytes) and the content of a tuple(assume 10 bytes). And the size of pointer is 4 bytes. So, if NTUP_PER_BUCKET is set to 10, about (28 * 10 / 28 * 10 + 4) of the work_mem is used to store tuples. If NTUP_PER_BUCKET is set to 1, about (28 / 28 + 4) of the work_mem is used to store tuples, reduced to 90% of the original. As a result, changing the value of NTUP_PER_BUCKET to 1 may increase the batches number by only about 10%. So it that enough to effect the performance? Or maybe i can not do the calculation simply in this way. Besides, we have larger main-memory now. If we set the work_mem larger, the more batches effect introduced by the smaller NTUP_PER_BUCKET value may be reduced, couldn't it? I have read about discussion about the NTUP_PER_BUCKET before. It seems that if we change NTUP_PER_BUCKET to 50 or even larger, the performance wouldn't be much worse. Because every tuple in the chain of a bucket has a hash value. Having more tuples in a bucket simply increase some comparisons of two integers. So is it the same if we change it smaller, that we could not get much better? Is it one of the reasons that we define it as 10? After all, it is my first time to discuss in a open source community. Thank you all for the reply and the discussion. Thanks. -- View this message in context: http://postgresql.1045698.n5.nabble.com/why-postgresql-define-NTUP-PER-BUCKET-as-10-not-other-numbers-smaller-tp5806472p5806617.html Sent from the PostgreSQL - hackers mailing list archive at Nabble.com. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers