On 09/10/2014 01:49 AM, Tomas Vondra wrote:
On 9.9.2014 16:09, Robert Haas wrote:
On Mon, Sep 8, 2014 at 5:53 PM, Tomas Vondra <t...@fuzzy.cz> wrote:
So I only posted the separate patch for those who want to do a review,
and then a "complete patch" with both parts combined. But it sure may be
a bit confusing.

Let's do this: post each new version of the patches only on this
thread, as a series of patches that can be applied one after another.

OK, attached. Apply in this order

1) dense-alloc-v5.patch
2) hashjoin-nbuckets-v12.patch

The dense-alloc-v5.patch looks good to me. I have committed that with minor cleanup (more comments below). I have not looked at the second patch.

I also did a few 'minor' changes to the dense allocation patch, most
notably:

* renamed HashChunk/HashChunkData to MemoryChunk/MemoryChunkData
   The original naming seemed a bit awkward.

That's too easy to confuse with regular MemoryContext and AllocChunk stuff. I renamed it to HashMemoryChunk.

* the chunks size is 32kB (instead of 16kB), and we're using 1/4
   threshold for 'oversized' items

   We need the threshold to be >=8kB, to trigger the special case
   within AllocSet. The 1/4 rule is consistent with ALLOC_CHUNK_FRACTION.

Should we care about the fact that if there are only a few tuples, we will nevertheless waste 32kB of memory for the chunk? I guess not, but I thought I'd mention it. The smallest allowed value for work_mem is 64kB.

I also considered Heikki's suggestion that while rebatching, we can
reuse chunks with a single large tuple, instead of copying it
needlessly. My attempt however made the code much uglier (I almost used
a GOTO, but my hands rejected to type it ...). I'll look into that.

I tried constructing a test case where the rehashing time would be significant enough for this to matter, but I couldn't find one. Even if the original estimate on the number of batches is way too small, we ramp up quickly to a reasonable size and the rehashing time is completely insignificant compared to all the other work. So I think we can drop that idea.

- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to