Re: [GENERAL] Why hash join cost calculation need reduction

2013-06-14 Thread Stephen Frost
* 高健 (luckyjack...@gmail.com) wrote:
[...]
 postgres=# explain analyze select * from sales s inner join customers c on
 s.cust_id = c.cust_id and c.cust_id =2;
[...]
 When I use  the  where condition such as  cust_id=2,
 
 postgresql is clever enough to know it is better to make  seqscan and
 filter ?

I havn't bothered to go look through the code specifics, but what I
expect is happening here is that PG realizes that c.cust_id and
s.cust_id are the same, and c.cust_id = 2, therefore the statement is
equivilant to:

explain analyze select * from sales s inner join customers c on
s.cust_id = 2 and c.cust_id = 2

and it's not going to try and build a hash to support an equality
operation against a constant- there's no point.  It can simply do a
nested loop with a filter because all it needs to do is find all cases
of sales.cust_id = 2 and all cases of customers.cust_id = 2 and
return the cartesian product of them.

Thanks,

Stephen


signature.asc
Description: Digital signature


[GENERAL] Why hash join cost calculation need reduction

2013-06-13 Thread 高健
Hi :


Sorry for disturbing. I don't know if it is ok to put this question here.



I want to learn more about  hash join's  cost calculation.

And I found the following function of PostgreSQL9.2.1. The hash join cost
is calculated.



But what confused me  is a reuction calculation:



qp_qual_cost.per_tuple -= hash_qual_cost.per_tuple;



I can understand that

when cost is evaluated, Adding and Multipling is needed.



My question is:

Why the reduction  is needed here  for cost calculation?



In fact , For my sql statement:

select * from sales s inner join customers c on s.cust_id = c.cust_id;



When I set  cpu_operator_cost to 0.0025,

qp_qual_cost.per_tuple  and  hash_qual_cost.per_tuple are all 0.0025.

So after reduction,  qp_qual_cost.per_tuple   is set to 0.



When I set  cpu_operator_cost to 0.0020,

qp_qual_cost.per_tuple  and  hash_qual_cost.per_tuple are all 0.0020.

So after reduction,  qp_qual_cost.per_tuple   is set to 0.


I think that  per_tuple cost can not be omitted here.



The following is the source part related to  qp_qual_cost.per_tuple  .

---

void

final_cost_hashjoin(PlannerInfo *root, HashPath *path,

JoinCostWorkspace *workspace,

SpecialJoinInfo *sjinfo,

SemiAntiJoinFactors *semifactors)

{

…

Cost   cpu_per_tuple;

QualCost  hash_qual_cost;

QualCost  qp_qual_cost;



   …

/*

 * Compute cost of the hashquals and qpquals (other
restriction clauses)

 * separately.

 */

cost_qual_eval(hash_qual_cost, hashclauses, root);

cost_qual_eval(qp_qual_cost, path-jpath.joinrestrictinfo,
root);



   …

qp_qual_cost.startup -= hash_qual_cost.startup;

qp_qual_cost.per_tuple -= hash_qual_cost.per_tuple;

   …



/*

 * For each tuple that gets through the hashjoin proper, we
charge

 * cpu_tuple_cost plus the cost of evaluating additional
restriction

 * clauses that are to be applied at the join.
(This is pessimistic since

 * not all of the quals may get evaluated at each tuple.)

 */

startup_cost += qp_qual_cost.startup;

cpu_per_tuple = cpu_tuple_cost + qp_qual_cost.per_tuple;

…



run_cost += cpu_per_tuple * hashjointuples;

path-jpath.path.startup_cost = startup_cost;

path-jpath.path.total_cost = startup_cost +
run_cost;

   …

}

---

Thanks !


Re: [GENERAL] Why hash join cost calculation need reduction

2013-06-13 Thread Stephen Frost
Greetings,

* 高健 (luckyjack...@gmail.com) wrote:
 And I found the following function of PostgreSQL9.2.1. The hash join cost
 is calculated.
 
 But what confused me  is a reuction calculation:
 
 qp_qual_cost.per_tuple -= hash_qual_cost.per_tuple;
 
 My question is:
 
 Why the reduction  is needed here  for cost calculation?

cost_qual_eval(hash_qual_cost, hashclauses, root);

returns the costs for *just the quals which can be used for the
hashjoin*, while

cost_qual_eval(qp_qual_cost, path-jpath.joinrestrictinfo, root); 

returns the costs for *ALL the quals*

qp_qual_cost.startup -= hash_qual_cost.startup;

and

qp_qual_cost.per_tuple -= hash_qual_cost.per_tuple;

extract the cost attributed to the quals used in the hashjoin from the
cost of the other quals in the overall expression.

The reason that we do this is because we're going to use a
hashjoin-specific costing for the qual costs later on in
final_cost_hashjoin:

startup_cost += hash_qual_cost.startup;
run_cost += hash_qual_cost.per_tuple * outer_path_rows *
clamp_row_est(inner_path_rows * innerbucketsize) * 0.5;

if we didn't do that, we'd end up double-counting those costs.

 In fact , For my sql statement:
 
 select * from sales s inner join customers c on s.cust_id = c.cust_id;
 
 When I set  cpu_operator_cost to 0.0025,
 
 qp_qual_cost.per_tuple  and  hash_qual_cost.per_tuple are all 0.0025.
 
 So after reduction,  qp_qual_cost.per_tuple   is set to 0.

Yes, because ALL the quals involved in your statement are quals being
used for the hashjoin- and those costs are calculated later on, as I
illustrated above.

 I think that  per_tuple cost can not be omitted here.

The per-tuple cost isn't omitted, it's added in later based on the
expected costs for doing those per-tuple operations for building and
using the hash table.

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [GENERAL] Why hash join cost calculation need reduction

2013-06-13 Thread Tom Lane
Stephen Frost sfr...@snowman.net writes:
 * 高健 (luckyjack...@gmail.com) wrote:
 Why the reduction  is needed here  for cost calculation?

   cost_qual_eval(hash_qual_cost, hashclauses, root);
 returns the costs for *just the quals which can be used for the
 hashjoin*, while
   cost_qual_eval(qp_qual_cost, path-jpath.joinrestrictinfo, root); 
 returns the costs for *ALL the quals*

Right.  Note what it says in create_hashjoin_path:

 * 'restrict_clauses' are the RestrictInfo nodes to apply at the join
 ...
 * 'hashclauses' are the RestrictInfo nodes to use as hash clauses
 *(this should be a subset of the restrict_clauses list)

So the two cost_qual_eval() calls are *both* counting the cost of the
hashclauses, and we have to undo that to get at just the cost of any
additional clauses beside the hash clauses.  See the comment about the
usage of qp_qual_cost further down:

/*
 * For each tuple that gets through the hashjoin proper, we charge
 * cpu_tuple_cost plus the cost of evaluating additional restriction
 * clauses that are to be applied at the join.  (This is pessimistic since
 * not all of the quals may get evaluated at each tuple.)
 */
startup_cost += qp_qual_cost.startup;
cpu_per_tuple = cpu_tuple_cost + qp_qual_cost.per_tuple;
run_cost += cpu_per_tuple * hashjointuples;

regards, tom lane


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Why hash join cost calculation need reduction

2013-06-13 Thread 高健
Hello:

Thanks for replying!



I understand it a little more.


And I compared the following statements:

First:

postgres=# explain analyze select * from sales s inner join customers c on
s.cust_id = c.cust_id;

QUERY
PLAN

--

 Hash Join  (cost=1.07..2.15 rows=3 width=84) (actual time=0.017..0.019
rows=3 loops=1)

   Hash Cond: (s.cust_id = c.cust_id)

   -  Seq Scan on sales s  (cost=0.00..1.04 rows=4 width=42) (actual
time=0.004..0.004 rows=4 loops=1)

   -  Hash  (cost=1.03..1.03 rows=3 width=42) (actual time=0.004..0.004
rows=3 loops=1)

 Buckets: 1024  Batches: 1  Memory Usage: 1kB

 -  Seq Scan on customers c  (cost=0.00..1.03 rows=3 width=42)
(actual time=0.001..0.001 rows=3 loops=1)

 Total runtime: 0.046 ms

(7 rows)



Second:

postgres=# explain analyze select * from sales s inner join customers c on
s.cust_id = c.cust_id and c.cust_id =2;

 QUERY
PLAN



 Nested Loop  (cost=0.00..2.10 rows=1 width=84) (actual time=0.000..0.000
rows=1 loops=1)

   -  Seq Scan on sales s  (cost=0.00..1.05 rows=1 width=42) (actual
time=0.000..0.000 rows=1 loops=1)

 Filter: (cust_id = 2)

 Rows Removed by Filter: 3

   -  Seq Scan on customers c  (cost=0.00..1.04 rows=1 width=42) (actual
time=0.000..0.000 rows=1 loops=1)

 Filter: (cust_id = 2)

 Rows Removed by Filter: 2

 Total runtime: 0.000 ms

(8 rows)



Third:

postgres=# explain analyze select * from sales s inner join customers c on
s.cust_id = c.cust_id and c.cust_id 4;

 QUERY
PLAN



 Nested Loop  (cost=0.00..2.13 rows=1 width=84) (actual time=0.014..0.018
rows=3 loops=1)

   Join Filter: (s.cust_id = c.cust_id)

   Rows Removed by Join Filter: 9

   -  Seq Scan on customers c  (cost=0.00..1.04 rows=1 width=42) (actual
time=0.007..0.007 rows=3 loops=1)

 Filter: (cust_id  4)

   -  Seq Scan on sales s  (cost=0.00..1.04 rows=4 width=42) (actual
time=0.000..0.000 rows=4 loops=3)

 Total runtime: 0.038 ms

(7 rows)



postgres=#



The first sql statement and third sql statment really drive the
final_cost_hashjoin function to be called.



I think  For the above third one,

 cost_qual_eval(hash_qual_cost, hashclauses, root)  is  for  s.cust_id =
c.cust_id

And

cost_qual_eval(qp_qual_cost, path-jpath.joinrestrictinfo, root) is for
s.cust_id = c.cust_id and c.cust_id 4



I've  found the following calling relation:

hash_inner_and_outer  à try_hashjoin_path  à create_hashjoin_path
àfinal_cost_hashjoin



For the second sql statement ,

In the hash_inner_and_outer function,

 the  if ( hashclauses)  condition is false,  So there is no chance to
try a hashjoin path.



That is :

When I use  the  where condition such as  cust_id=2,

postgresql is clever enough to know it is better to make  seqscan and
filter ?

2013/6/13 Tom Lane t...@sss.pgh.pa.us

 Stephen Frost sfr...@snowman.net writes:
  * 高健 (luckyjack...@gmail.com) wrote:
  Why the reduction  is needed here  for cost calculation?

cost_qual_eval(hash_qual_cost, hashclauses, root);
  returns the costs for *just the quals which can be used for the
  hashjoin*, while
cost_qual_eval(qp_qual_cost, path-jpath.joinrestrictinfo, root);
  returns the costs for *ALL the quals*

 Right.  Note what it says in create_hashjoin_path:

  * 'restrict_clauses' are the RestrictInfo nodes to apply at the join
  ...
  * 'hashclauses' are the RestrictInfo nodes to use as hash clauses
  *(this should be a subset of the restrict_clauses list)

 So the two cost_qual_eval() calls are *both* counting the cost of the
 hashclauses, and we have to undo that to get at just the cost of any
 additional clauses beside the hash clauses.  See the comment about the
 usage of qp_qual_cost further down:

 /*
  * For each tuple that gets through the hashjoin proper, we charge
  * cpu_tuple_cost plus the cost of evaluating additional restriction
  * clauses that are to be applied at the join.  (This is pessimistic
 since
  * not all of the quals may get evaluated at each tuple.)
  */
 startup_cost += qp_qual_cost.startup;
 cpu_per_tuple = cpu_tuple_cost + qp_qual_cost.per_tuple;
 run_cost += cpu_per_tuple * hashjointuples;

 regards, tom lane