Re: [HACKERS] Proposed Patch to Improve Performance of Multi-BatchHash Join for Skewed Data Sets

2009-03-20 Thread Bryce Cutt
The estimation functions assume the inner relation join column is
unique.  But it freezes (flushes back to the main hash table) one skew
bucket at a time in order of least importance so if 100 inner tuples
can fit in the skew buckets then the skew buckets are only fully blown
out if the best tuple (the single most common value) occurs more than
100 times in the inner relation.  And up until that point you still
have the tuples in memory that are the best "per tuple worth of
memory".  But yes, after that point (more than 100 tuples of that best
MCV) the entire effort was wasted.  The skew buckets are dynamically
flushed just like buckets in a dynamic hash join would be.

- Bryce Cutt

On Fri, Mar 20, 2009 at 5:51 PM, Robert Haas  wrote:
> On Fri, Mar 20, 2009 at 8:45 PM, Bryce Cutt  wrote:
>> On Fri, Mar 20, 2009 at 5:35 PM, Robert Haas  wrote:
>>> If the inner relation isn't fairly close to unique you shouldn't be
>>> using this optimization in the first place.
>> Not necessarily true.  Seeing as (when the statistics are correct) we
>> know each of these inner tuples will match with the largest amount of
>> outer tuples it is just as much of a win per inner tuple as when they
>> are unique.  There is just a chance you will have to give up on the
>> optimization part way through if too many inner tuples fall into the
>> new "skew buckets" (formerly IM buckets) and dump the tuples back into
>> the main buckets.  The potential win is still pretty high though.
>>
>> - Bryce Cutt
>
> Maybe I'm remembering wrong, but I thought the estimating functions
> assuemd that the inner relation was unique.  So if there turn out to
> be 2, 3, 4, or more copies of each value, the chances of blowing out
> the skew hash table are almost 100%, I would think...  am I wrong?
>
> ...Robert
>

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposed Patch to Improve Performance of Multi-BatchHash Join for Skewed Data Sets

2009-03-20 Thread Bryce Cutt
Not necessarily true.  Seeing as (when the statistics are correct) we
know each of these inner tuples will match with the largest amount of
outer tuples it is just as much of a win per inner tuple as when they
are unique.  There is just a chance you will have to give up on the
optimization part way through if too many inner tuples fall into the
new "skew buckets" (formerly IM buckets) and dump the tuples back into
the main buckets.  The potential win is still pretty high though.

- Bryce Cutt


On Fri, Mar 20, 2009 at 5:35 PM, Robert Haas  wrote:
> On Fri, Mar 20, 2009 at 8:14 PM, Tom Lane  wrote:
>> Bryce Cutt  writes:
>>> Here is the new patch.
>>
>> Applied with revisions.  I undid some of the "optimizations" that
>> cluttered the code in order to save a cycle or two per tuple --- as per
>> previous discussion, that's not what the performance questions were
>> about.  Also, I did not like the terminology "in-memory"/"IM"; it seemed
>> confusing since the main hash table is in-memory too.  I revised the
>> code to consistently refer to the additional hash table as a "skew"
>> hashtable and the optimization in general as skew optimization.  Hope
>> that seems reasonable to you --- we could search-and-replace it to
>> something else if you'd prefer.
>>
>> For the moment, I didn't really do anything about teaching the planner
>> to account for this optimization in its cost estimates.  The initial
>> estimate of the number of MCVs that will be specially treated seems to
>> me to be too high (it's only accurate if the inner relation is unique),
>> but getting a more accurate estimate seems pretty hard, and it's not
>> clear it's worth the trouble.  Without that, though, you can't tell
>> what fraction of outer tuples will get the short-circuit treatment.
>
> If the inner relation isn't fairly close to unique you shouldn't be
> using this optimization in the first place.
>
> ...Robert
>

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposed Patch to Improve Performance of Multi-BatchHash Join for Skewed Data Sets

2009-03-02 Thread Bryce Cutt
Here is the new patch.

Our experiments show no noticeable performance issue when using the
patch for cases where the optimization is not used because the number
of extra statements executed when the optimization is disabled is
insignificant.

We have updated the patch to remove a couple of if statements, but
this is really minor.  The biggest change was to MultiExecHash that
avoids an if check per tuple by duplicating the hashing loop.

To demonstrate the differences, here is an analysis of the code
changes and their impact.

Three cases:
1) One batch hash join - Optimization is disabled.  Extra statements
executed are:
 - One if (hashtable->nbatch > 1) in ExecHashJoin (line 356 of nodeHashjoin.c)
 - One if optimization_on in MultiExecHash (line 259 of nodeHash.c)
 - One if optimization_on in MultiExecHash per probe tuple (line 431
of nodeHashjoin.c)
 - One if statement in ExecScanHashBucket per probe tuple (line 1071
of nodeHash.c)

2) Multi-batch hash join with limited skew - Optimization is disabled.
 Extra statements executed are:
 - One if (hashtable->nbatch > 1) in ExecHashJoin (line 356 of nodeHashjoin.c)
 - Executes ExecHashJoinDetectSkew method (at line 357 of
nodeHashjoin.c) that reads stats tuple for probe relation attribute
and determines if skew is above cut-off.  In this case, skew is not
above cutoff and no extra memory is used.
 - One if optimization_on in MultiExecHash (line 259 of nodeHash.c)
 - One if optimization_on in MultiExecHash per probe tuple (line 431
of nodeHashjoin.c)
 - One if statement in ExecScanHashBucket per probe tuple (line 1071
of nodeHash.c)

3) Multi-batch hash join with skew - Optimization is enabled. Extra
statements executed are:
 - One if (hashtable->nbatch > 1) in ExecHashJoin (line 356 of nodeHashjoin.c)
 - Executes ExecHashJoinDetectSkew method (at line 357 of
nodeHashjoin.c) that reads stats tuple for probe relation attribute
and determines there is skew.  Allocates space for XXX which is 2% of
work_mem.
 - One if optimization_on in MultiExecHash (line 259 of nodeHash.c)
 - In MultiExecHash after each tuple is hashed determines if its join
attribute value matches one of the MCVs.  If it does, it is put in the
MCV structure.  Cost is the hash and search for each build tuple.
 - If all IM buckets end up frozen in the build phase (MultiExecHash)
because they grow larger than the memory allowed for IM buckets then
skew optimization is turned off and the probe phase reverts to Case 2
 - For each probe tuple, determines if its value is a MCV by
performing hash and quick table lookup.  If yes, probes MCV bucket
otherwise does regular hash algorithm as usual.
 - One if statement in ExecScanHashBucket per probe tuple (line 1071
of nodeHash.c)
 - Additional cost is determining if a tuple is a common tuple (both
on build and probe side).  This additional cost is dramatically
outweighed by avoiding disk I/Os (even if they never hit the disk due
to caching).

The if statement on line 440 of nodeHashjoin.c (in ExecHashJoin) has
been rearranged so that in the single batch case short circuit
evaluation requires only the first test in the IF to be checked.

The "limited skew" check mentioned in Case 2 above is a simple check
in the ExecHashJoinDetectSkew function.

- Bryce Cutt



On Thu, Feb 26, 2009 at 12:16 PM, Bryce Cutt  wrote:
> The patch originally modified the cost function but I removed that
> part before we submitted it to be a bit conservative about our
> proposed changes.  I didn't like that for large plans the statistics
> were retrieved and calculated many times when finding the optimal
> query plan.
>
> The overhead of the algorithm when the skew optimization is not used
> ends up being roughly a function call and an if statement per tuple.
> It would be easy to remove the function call per tuple.  Dr. Lawrence
> has come up with some changes so that when the optimization is turned
> off, the function call does not happen at all and instead of the if
> statement happening per tuple it is run just once per join.  We have
> to test this a bit more but it should further reduce the overhead.
>
> Hopefully we will have the new patch ready to go this weekend.
>
> - Bryce Cutt
>
>
> On Thu, Feb 26, 2009 at 7:45 AM, Tom Lane  wrote:
>> Heikki's got a point here: the planner is aware that hashjoin doesn't
>> like skewed distributions, and it assigns extra cost accordingly if it
>> can determine that the join key is skewed.  (See the "bucketsize" stuff
>> in cost_hashjoin.)  If this patch is accepted we'll want to tweak that
>> code.
>>
>> Still, that has little to do with the current gating issue, which is
>> whether we've convinced ourselves that the patch doesn't cause a
>> performance decrease for cases in which it's unable to help.
>>
>>                        regar

Re: [HACKERS] Proposed Patch to Improve Performance of Multi-BatchHash Join for Skewed Data Sets

2009-02-26 Thread Bryce Cutt
The patch originally modified the cost function but I removed that
part before we submitted it to be a bit conservative about our
proposed changes.  I didn't like that for large plans the statistics
were retrieved and calculated many times when finding the optimal
query plan.

The overhead of the algorithm when the skew optimization is not used
ends up being roughly a function call and an if statement per tuple.
It would be easy to remove the function call per tuple.  Dr. Lawrence
has come up with some changes so that when the optimization is turned
off, the function call does not happen at all and instead of the if
statement happening per tuple it is run just once per join.  We have
to test this a bit more but it should further reduce the overhead.

Hopefully we will have the new patch ready to go this weekend.

- Bryce Cutt


On Thu, Feb 26, 2009 at 7:45 AM, Tom Lane  wrote:
> Heikki's got a point here: the planner is aware that hashjoin doesn't
> like skewed distributions, and it assigns extra cost accordingly if it
> can determine that the join key is skewed.  (See the "bucketsize" stuff
> in cost_hashjoin.)  If this patch is accepted we'll want to tweak that
> code.
>
> Still, that has little to do with the current gating issue, which is
> whether we've convinced ourselves that the patch doesn't cause a
> performance decrease for cases in which it's unable to help.
>
>                        regards, tom lane
>

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposed Patch to Improve Performance of Multi-BatchHash Join for Skewed Data Sets

2009-01-06 Thread Bryce Cutt
The latest version of the patch is attached.  The revision considerably
cleans up the code, especially variable naming consistency.  We have
adopted the use of IM (in-memory) in variable names for the hash table
structures as suggested.

Two other implementations changes:

1) The overhead of the hash table has been reduced by allocating an
array of pointers instead of an array of structs and only allocating the
structs as they are needed to store MCVs.  IM buckets are now frozen
by first removing all tuples then deleting the struct from memory.  This
allows more memory to be freed as well as the removal of the frozen
field in the IM bucket struct which now makes that struct only 8 bytes
on a 32bit machine.  If for some reason all IM buckets are frozen all
IM struct overhead is removed from memory to further reduce the memory
footprint.

2) This patch supports using a set percentage of work_mem (currently 2%)
to store the build tuples that join frequently with probe relation
tuples.  The code only allocates MCVs up to the maximum amount and will
flush from the in-memory hash table if the memory is ever exceeded.  The
code also ensures that the overall join memory used (the MCV hash table
and batch 0 in memory) does not exceed spaceAllocated as usual.  If this 2%
of memory is not used by the MCV hash table then it can be used by batch 0.

These changes are mostly relate to style, although some of the cleanup
has made the code slightly faster.

We would really appreciate help on finalizing this patch, especially in
regard to style issues.  Thank you for all the help.

- Dr. Ramon Lawrence and Bryce Cutt


On Sun, Jan 4, 2009 at 6:48 PM, Robert Haas  wrote:
>> 1) Isn't ExecHashFreezeNextMCVPartition actually a most common TUPLE
>> partition, rather than a most common VALUE partition (and similarly for
>> ExecHashGetMCVPartition)?
>>
>> A partition stores all tuples that correspond to that MCV value.  It is
>> usually one for foreign key joins but may be more than one.  (Plus, it
>> may store other tuples that have the same hash value for the join
>> attribute as the MCV value.)
>
> I guess my point is - check that your variable/function/structure
> member naming is consistent between different parts of the code.  The
> ExecHashGetMCVPartition function accesses structure members called
> nMostCommonTuplePartitionHashBuckets, nMostCommonTuplePartition, and
> mostCommonTuplePartition.  It seems inconsistent that the function
> name uses MCVPartition and
> the structure members use mostCommonTuplePartition - aren't we talking
> about the same thing in both cases?
>
> And, more to the point, the terminology just seems wrong to me, the
> more I think about it.  I mean, ExecHashGetMCVParitition is not
> finding a partition of the MCVs.  It's finding a partition of an
> in-memory hash table which we plan to populate with MCVs.   That's why
> I'm wondering if we should make it ExecHashGetIMPartition,
> nIMPartitionHashBuckets, etc.
>
>> 2) Have you done any performance testing on the impact of this change?
>>
>> Yes, the ability to use MCVs for more than sequential scans
>> significantly improves performance in multi-join cases.  The allocation
>> of a percentage of memory of only 1% will not affect any performance
>> results as all our testing was done with the MCV value of 10 or 100
>> which is significantly below a 1% allocation of work_mem.  If anything,
>> performance would be improved when using more MCVs.
>
> That is a very good thing.
>
>> Finally, any help you can provide on style concerns to make this easier
>> to commit would be appreciated.  We will put all the effort required
>> over the next few days to get this into 8.4.
>
> If I have time, I might be willing to make a style run over the next
> version of the patch after you post it to the list, and just correct
> anything I see and repost.  This might be faster than sending comments
> back and forth, if you are OK with it.  I have a day job so this would
> probably need to be Tuesday or Wednesday night.  My main advice is
> "read the diff before you post it".  Sometimes things will just pop
> out at you that are less obvious when you are head-down in the code.
>
> Random stuff I notice in v4 patch: make sure all lines fit in 80
> columns (except for long error messages if any), missing space before
> closing comment delimiter in ExecHashGetMCVPartition, extraneous blank
> line added to nodeHash.c just before the comment that says "and remove
> from hash table", comment in ExecHashJoinGetMostCommonValues just
> after the get_attstatsslot call is formatted strangely, still extra
> curly braces around the calls to
> ExecScanHashMostCommonValuePartition/ExecScanHashBucket.
>
> ...Robert
>


histojoin_v5.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposed Patch to Improve Performance of Multi-BatchHash Join for Skewed Data Sets

2008-12-30 Thread Bryce Cutt
Here is the next patch version.

The naming and style concerns have been addressed.  The patch now only
touches 5 files.  4 of those files are hashjoin specific and 1 is to
add a couple lines to a hashjoin specific struct in another file.

The code can now find the the MCVs in more cases.  Even if the probe
side is an operator other than a seq scan (such as another hashjoin)
the code can now find the stats tuple for the underlying relation.

The new idea of limiting the number of MCVs to a percentage of memory
has not been added yet.

- Bryce Cutt


On Mon, Dec 29, 2008 at 8:55 PM, Robert Haas  wrote:
>> I think that setting aside a minimum percentage of work_mem may be a
>> reasonable approach.  For instance, setting aside 1% at even 1 MB
>> work_mem would be 10 KB which is enough to store about 40 MCV tuples of
>> the TPC-H database.  Such a small percentage would be very unlikely (but
>> still possible) to change the number of batches used.  Then, given the
>> memory allocation and the known tuple size + overhead, only that number
>> of MCVs are selected for the MCV table regardless how many there are.
>> The MCV table size would then increase as work_mem is changed up to a
>> maximum given by the number of MCVs.
>
> Sounds fine.  Maybe 2-3% would be better.
>
>> The code when building the MCV hash table keeps track of the order of
>> insertion of the best MCVs.  It then flushes the MCV partitions in
>> decreasing order of frequency of MCVs.  Thus, by the end of the build
>> partitioning phase the MCV hash table should only store the most
>> frequent MCV tuples.  Even with many-to-many joins as long as we keep
>> all build tuples that have a given MCV in memory, then everything is
>> fine.  You would get into problems if you only flushed some of the
>> tuples of a certain MCV but that will not happen.
>
> OK, I'll read it again - I must not have understood.
>
> It would be good to post an updated patch soon, even if not everything
> has been addressed.
>
> ...Robert
>


histojoin_v4.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposed Patch to Improve Performance of Multi-Batch Hash Join for Skewed Data Sets

2008-12-23 Thread Bryce Cutt
Because there is no nice way in PostgreSQL (that I know of) to derive
a histogram after a join (on an intermediate result) currently
usingMostCommonValues is only enabled on a join when the outer (probe)
side is a table scan (seq scan only actually).  See
getMostCommonValues (soon to be called
ExecHashJoinGetMostCommonValues) for the logic that determines this.

Here is the result of explain (on a 100MB version of PostgreSQL):
"Hash Left Join  (cost=16232.00..91035.00 rows=60 width=526)"
"  Hash Cond: (l.l_partkey = p.p_partkey)"
"  ->  Hash Left Join  (cost=15368.00..75171.00 rows=60 width=395)"
"Hash Cond: (l.l_orderkey = o.o_orderkey)"
"->  Seq Scan on lineitem l  (cost=0.00..17867.00 rows=60
width=125)"
"->  Hash  (cost=8073.00..8073.00 rows=15 width=270)"
"  ->  Hash Left Join  (cost=700.50..8073.00 rows=15 width=270)"
"Hash Cond: (o.o_custkey = c.c_custkey)"
"->  Seq Scan on orders o  (cost=0.00..4185.00
rows=15 width=109)"
"->  Hash  (cost=513.00..513.00 rows=15000 width=161)"
"  ->  Seq Scan on customer c
(cost=0.00..513.00 rows=15000 width=161)"
"  ->  Hash  (cost=614.00..614.00 rows=2 width=131)"
"->  Seq Scan on part p  (cost=0.00..614.00 rows=2 width=131)"

If you take a look at the explain for that join you will see that the
first of the relations joined are orders and customer on custkey.
There is almost no skew in the o_custkey attribute of orders even in
the Z2 dataset so the difference between hashjoin with and without
usingMostCommonValues enabled is quite small.

The second join performed is to join the result of the first join with
lineitem on orderkey.  There is no skew at all in the l_orderkey
attribute of lineitem so usingMostCommonValues can not help at all.

The third join performed is to join the result of the second join with
part on partkey.  There is a lot of skew in the l_partkey attribute of
lineitem but because the probe side of the third join is an
intermediate from the second join and not a seq scan the algorithm
cannot figure out the MCVs of the probe side.

So on the query presented almost no skew can be exploited on the first
join and no other joins can have their skew exploited at all because
of the order PostgreSQL does the joins in.  Basically yes, you would
not see any real benefit from using the most common values on this
query.

We experimented with sampling (mentioned in the paper) to make an
educated guess of MCVs on intermediary results but found that because
a random sample could not be obtained the results were always very
inaccurate.  I basically just read a percentage of tuples from the
probe relation before partitioning the build relation, derived the
MCVs in a single pass, wrote the tuples back out to a temp file
(because reading back from here is less expensive than resetting the
probe side tree), then did the join as usual while remembering to read
back from my temp file before reading the rest of the probe side
tuples.  Our tests indicate that sampling is not likely a good
solution for deriving MCVs from intermediary results.

In the Java implementation of histojoin we experimented with
exploiting skew in multiple joins of a star join with some success
(detailed in paper).  I am not sure how this would be accomplished
nicely in PostgreSQL.

If the cost operators knew how to order the joins to make the best use
of skew in the relations PostgreSQL could use the benefits of
histojoin more often if perhaps doing a join with skew first would
have speed benefits over doing the smaller join first.  This change
could be a future addition to PostgreSQL if this patch is accepted.
It relies on getting the stats tuple for the join during the planning
phase (in the cost function) and estimating the benefit that would
have on the join cost.

- Bryce Cutt


On Mon, Dec 22, 2008 at 6:15 AM, Joshua Tolley  wrote:
> On Sun, Dec 21, 2008 at 10:25:59PM -0500, Robert Haas wrote:
>> [Some performance testing.]
>
> I (finally!) have a chance to post my performance testing results... my
> apologies for the really long delay. 
>
> Unfortunately I'm not seeing wonderful speedups with the particular
> queries I did in this case. I generated three 1GB datasets, with skews
> set at 1, 2, and 3. The test script I wrote turns on enable_usestatmcvs
> and runs EXPLAIN ANALYZE on the same query five times. Then it turns
> enable_usestatmcvs off, and runs the same query five more times. It does
> this with each of the three datasets in turn, and then starts over at
> the beginning until I tell it to quit. My results showed a statistically
> significant improvement in speed only on the skew == 3 dataset.
>
> I did the 

Re: [HACKERS] Proposed Patch to Improve Performance of Multi-Batch Hash Join for Skewed Data Sets

2008-12-20 Thread Bryce Cutt
Robert,

I thoroughly appreciate the constructive criticism.

The compile errors are due to my development process being convoluted.
 I will endeavor to not waste your time in the future with errors
caused by my development process.

I have updated the code to follow the conventions and suggestions
given.  I am now working on adding the requested documentation.  I
will not submit the next patch until that is done.  The functionality
has not changed so you can performance test with the patch you have.

As for that particularly ugly piece of code.  I figured that out while
digging through the selfuncs code.  Basically I needed a way to get
the stats tuple for the outer relation join column of the join but to
do that I needed to figure out how to get the actual relation id and
attribute number that was being joined.

I have not yet figured out a better way to do this but I am sure there
is someone on the mailing list with far more knowledge of this than I
have.

I could possibly be more vigorous in testing to make sure the things I
am casting are exactly what I expect.  My tests have always been
consistent so far.

I am essentially doing what is done in selfuncs.  I believe I could
use the examine_variable() function in selfuncs.c except I would first
need a PlannerInfo and I don't think I can get that from inside the
join initialization code.

- Bryce Cutt


On Mon, Dec 15, 2008 at 8:51 PM, Robert Haas  wrote:
> I have to admit that I haven't fully grokked what this patch is about
> just yet, so what follows is mostly a coding style review at this
> point.  It would help a lot if you could add some comments to the new
> functions that are being added to explain the purpose of each at a
> very high level.  There's clearly been a lot of thought put into some
> parts of this logic, so it would be worth explaining the reasoning
> behind that logic.
>
> This patch applies clearly against CVS HEAD, but does not compile
> (please fix the warning, too).
>
> nodeHash.c:88: warning: no previous prototype for 'freezeNextMCVPartiton'
> nodeHash.c: In function 'freezeNextMCVPartiton':
> nodeHash.c:148: error: 'struct HashJoinTableData' has no member named 
> 'inTupIOs'
>
> I commented out the offending line.  It errored out again here:
>
> nodeHashjoin.c: In function 'getMostCommonValues':
> nodeHashjoin.c:136: error: wrong type argument to unary plus
>
> After removing the stray + sign, it compiled, but failed the
> "rangefuncs" regression test.  If you're going to keep the
> enable_hashjoin_usestatmvcs() GUC around, you need to patch
> rangefuncs.out so that the regression tests pass.  I think, however,
> that there was some discussion of removing that before the patch is
> committed; if so, please do that instead.  Keeping the GUC would also
> require patching the documentation, which the current patch does not
> do.
>
> getMostCommonValues() isn't a good name for a non-static function
> because there's nothing to tip the reader off to the fact that it has
> something to do with hash joins; compare with the other function names
> defined in the same header file.  On the flip side, that function has
> only one call site, so it should probably be made static and not
> declared in the header file at all.  Some of the other new functions
> need similar treatment.  I am also a little suspicious of this bit of
> code:
>
>relid = getrelid(((SeqScan *) ((SeqScanState *)
> outerPlanState(hjstate))->ps.plan)->scanrelid,
> estate->es_range_table);
>clause = (FuncExprState *) lfirst(list_head(hjstate->hashclauses));
>argstate = (ExprState *) lfirst(list_head(clause->args));
>variable = (Var *) argstate->expr;
>
> I'm not very familiar with the hash join code, but it seems like there
> are a lot of assumptions being made there about what things are
> pointing to what other things.  Is this this actually safe?  And if it
> is, perhaps a comment explaining why?
>
> getMostCommonValues() also appears to be creating and maintaining a
> counter called collisionsWhileHashing, but nothing is ever done with
> the counter.  On a similar note, the variables relattnum, atttype, and
> atttypmod don't appear to be necessary; 2 out of 3 of them are only
> used once, so maybe inlining the reference and dropping the variable
> would make more sense.  Also, the if (HeapTupleIsValid(statsTuple))
> block encompasses the whole rest of the function, maybe if
> (!HeapTupleIsValid(statsTuple)) return?
>
> I don't understand why
> hashtable->mostCommonTuplePartition[bucket].tuples and .frozen need to
> be initialized to 0.  It looks to me like those are in a zero-filled
> array that was just allocated, so it 

Re: [HACKERS] Proposed Patch to Improve Performance of Multi-Batch Hash Join for Skewed Data Sets

2008-11-05 Thread Bryce Cutt
The error is causes by me Asserting against the wrong variable.  I
never noticed this as I apparently did not have assertions turned on
on my development machine.  That is fixed now and with the new patch
version I have attached all assertions are passing with your query and
my test queries.  I added another assertion to that section of the
code so that it is a bit more vigorous in confirming the hash table
partition is correct.  It does not change the operation of the code.

There are two partition counts.  One holds the maximum number of
buckets in the hash table and the other counts the number of actual
buckets created for hash values.  I was incorrectly testing against
the second one because that was valid before I started using a hash
table to store the buckets.

The enable_hashjoin_usestatmcvs flag was valuable for my own research
and tests and likely useful for your review but Tom is correct that it
can be removed in the final version.

- Bryce Cutt


On Wed, Nov 5, 2008 at 7:22 AM, Joshua Tolley <[EMAIL PROTECTED]> wrote:
> -BEGIN PGP SIGNED MESSAGE-
> Hash: SHA1
>
> On Wed, Nov 5, 2008 at 8:20 AM, Tom Lane  wrote:
>> Joshua Tolley  writes:
>>> On Mon, Oct 20, 2008 at 03:42:49PM -0700, Lawrence, Ramon wrote:
>>>> We propose a patch that improves hybrid hash join's performance for large
>>>> multi-batch joins where the probe relation has skew.
>>
>>> I also recommend modifying docs/src/sgml/config.sgml to include the
>>> enable_hashjoin_usestatmcvs option.
>>
>> If the patch is actually a win, why would we bother with such a GUC
>> at all?
>>
>>regards, tom lane
>
> Good point. Leaving it in place for patch review purposes is useful,
> but we can probably lose it in the end.
>
> - - Josh / eggyknap
> -BEGIN PGP SIGNATURE-
> Version: GnuPG v1.4.9 (GNU/Linux)
> Comment: http://getfiregpg.org
>
> iEYEARECAAYFAkkRujsACgkQRiRfCGf1UMNSTACfbpDSQn0HGSVr3jI30GJApcRD
> YbQAn2VZdI/aIalGBrbn1hlRWPEvbgV5
> =LKZ3
> -END PGP SIGNATURE-
>


histojoin_v2.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers