Re: [HACKERS] -HEAD planner issue wrt hash_joins on dbt3 ?

2006-09-28 Thread Stefan Kaltenbrunner

Matteo Beccati wrote:

Stefan Kaltenbrunner wrote:

too bad - however any idea on one of the other troubling querys (q21) I
mentioned in the mail I resent to the list (after the original one got
lost)?

http://archives.postgresql.org/pgsql-hackers/2006-09/msg02011.php


What happens if you increase statistics for l_orderkey?


statistic target is already at 1000 ...


Stefan

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [HACKERS] -HEAD planner issue wrt hash_joins on dbt3 ?

2006-09-27 Thread Matteo Beccati

Stefan Kaltenbrunner wrote:

too bad - however any idea on one of the other troubling querys (q21) I
mentioned in the mail I resent to the list (after the original one got
lost)?

http://archives.postgresql.org/pgsql-hackers/2006-09/msg02011.php


What happens if you increase statistics for l_orderkey?


Best regards
--
Matteo Beccati
http://phpadsnew.com
http://phppgads.com

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] -HEAD planner issue wrt hash_joins on dbt3 ?

2006-09-26 Thread Stefan Kaltenbrunner
Tom Lane wrote:
 Stefan Kaltenbrunner [EMAIL PROTECTED] writes:
 http://www.kaltenbrunner.cc/files/analyze_q9_beta1.txt
 
 The next problem seems to be the drastic misestimation of this join
 size:
 
 -  Nested Loop  (cost=0.00..6872092.36 rows=135 width=28) (actual 
 time=94.762..14429291.129 rows=3554044 loops=1)
   -  Merge Join  (cost=0.00..519542.74 rows=449804 width=16) (actual 
 time=48.197..49636.006 rows=474008 loops=1)
 Merge Cond: (part.p_partkey = partsupp.ps_partkey)
 -  Index Scan using pk_part on part  (cost=0.00..105830.22 
 rows=112447 width=4) (actual time=34.646..14381.644 rows=118502 loops=1)
   Filter: ((p_name)::text ~~ '%ghost%'::text)
 -  Index Scan using i_ps_partkey on partsupp  
 (cost=0.00..388943.05 rows=8000278 width=12) (actual time=13.511..22659.364 
 rows=7999685 loops=1)
   -  Index Scan using i_l_suppkey_partkey on lineitem  
 (cost=0.00..14.11 rows=1 width=24) (actual time=4.415..30.310 rows=7 
 loops=474008)
 Index Cond: ((partsupp.ps_partkey = lineitem.l_partkey) AND 
 (partsupp.ps_suppkey = lineitem.l_suppkey))
 
 With a factor-of-25000 error in that rowcount estimate, it's amazing the
 plans aren't worse than they are.
 
 It evidently thinks that most of the rows in the join of part and
 partsupp won't have any matching rows in lineitem, whereas on average
 there are about 7 matching rows apiece.  So that's totally wacko, and
 it's not immediately obvious why.  Could we see the pg_stats entries for
 part.p_partkey, partsupp.ps_partkey, partsupp.ps_suppkey,
 lineitem.l_partkey, lineitem.l_suppkey?


http://www.kaltenbrunner.cc/files/dbt3_beta1_statistics.txt


Stefan

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] -HEAD planner issue wrt hash_joins on dbt3 ?

2006-09-26 Thread Tom Lane
Stefan Kaltenbrunner [EMAIL PROTECTED] writes:
 Tom Lane wrote:
 It evidently thinks that most of the rows in the join of part and
 partsupp won't have any matching rows in lineitem, whereas on average
 there are about 7 matching rows apiece.  So that's totally wacko, and
 it's not immediately obvious why.  Could we see the pg_stats entries for
 part.p_partkey, partsupp.ps_partkey, partsupp.ps_suppkey,
 lineitem.l_partkey, lineitem.l_suppkey?

 http://www.kaltenbrunner.cc/files/dbt3_beta1_statistics.txt

OK, so we have 2 million parts and 10 suppliers, and ANALYZE doesn't
seem to have been too far off at estimating either of those numbers.
I think the problem is that there are not very many suppliers for any
particular part, and thus the condition part match AND supplier match
is really not much more selective than part match alone.  The planner
is supposing that their selectivities are independent, which they
aren't.

Offhand I don't see any good way to fix this without multi-column
statistics, which is something that's certainly not happening for 8.2 :-(

regards, tom lane

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] -HEAD planner issue wrt hash_joins on dbt3 ?

2006-09-26 Thread Stefan Kaltenbrunner
Tom Lane wrote:
 Stefan Kaltenbrunner [EMAIL PROTECTED] writes:
 Tom Lane wrote:
 It evidently thinks that most of the rows in the join of part and
 partsupp won't have any matching rows in lineitem, whereas on average
 there are about 7 matching rows apiece.  So that's totally wacko, and
 it's not immediately obvious why.  Could we see the pg_stats entries for
 part.p_partkey, partsupp.ps_partkey, partsupp.ps_suppkey,
 lineitem.l_partkey, lineitem.l_suppkey?
 
 http://www.kaltenbrunner.cc/files/dbt3_beta1_statistics.txt
 
 OK, so we have 2 million parts and 10 suppliers, and ANALYZE doesn't
 seem to have been too far off at estimating either of those numbers.
 I think the problem is that there are not very many suppliers for any
 particular part, and thus the condition part match AND supplier match
 is really not much more selective than part match alone.  The planner
 is supposing that their selectivities are independent, which they
 aren't.

looks like there are exactly 4 suppliers for any given part so that
seems indeed like the problem :-(

 
 Offhand I don't see any good way to fix this without multi-column
 statistics, which is something that's certainly not happening for 8.2 :-(

too bad - however any idea on one of the other troubling querys (q21) I
mentioned in the mail I resent to the list (after the original one got
lost)?

http://archives.postgresql.org/pgsql-hackers/2006-09/msg02011.php


Stefan

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] -HEAD planner issue wrt hash_joins on dbt3 ?

2006-09-25 Thread Stefan Kaltenbrunner
[RESENT in a modified version since the original reply seems to have ben
lost by the listserver which seems to happen sometimes lately]

Tom Lane wrote:
 Matteo Beccati [EMAIL PROTECTED] writes:
 Tom Lane ha scritto:
 Matteo Beccati [EMAIL PROTECTED] writes:
 I cannot see anything bad by using something like that:
 if (histogram is large/representative enough)
 Well, the question is exactly what is large enough?  I feel a bit
 uncomfortable about applying the idea to a histogram with only 10
 entries (especially if we ignore two of 'em).  With 100 or more,
 it sounds all right.  What's the breakpoint?
 
 Yes, I think 100-200 could be a good breakpoint.
 
 I've committed this change with (for now) 100 as the minimum histogram
 size to use.  Stefan, are you interested in retrying your benchmark?

spent some time retesting that and I got the following results(this is
the same box as before but with a much slower disk-setup and a newly
initdb'd cluster):

http://www.kaltenbrunner.cc/files/analyze_q9_beta1.txt

all that is with 2GB of effective_cache_size(plan does not change with
much smaller settings btw) and a statistic target of 1000 and the
following parameters:

default planner settings for the EXPLAIN and the first EXPLAIN ANALYZE -
then the same query with disabled hash_joins, the next one is with
disabled nest_loops and the fastest one is with both nest_loop and
hash_joins disabled (all run in that order - so there are possible
caching effects).

in comparision to:

http://www.kaltenbrunner.cc/files/analyze_q9.txt

we nearly got a 7 figure speedup due to the latest changes(much better
estimates at least) - however the mergejoin+sort only plan is still faster.



the other troubling query is the following:

query: http://www.kaltenbrunner.cc/files/7/power1/db/plans/power_query21.txt

plans: http://www.kaltenbrunner.cc/files/analyze_q21_beta1.txt

(default,default,enable_nestloop=off,enable_nestloop=off and
enable_hashjoin=off)

despite having not-too bad estimates for most of the key-points in the
plan the actual runtime of the choosen plan is quite disappointing.




Stefan

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] -HEAD planner issue wrt hash_joins on dbt3 ?

2006-09-25 Thread Tom Lane
Stefan Kaltenbrunner [EMAIL PROTECTED] writes:
 http://www.kaltenbrunner.cc/files/analyze_q9_beta1.txt

The next problem seems to be the drastic misestimation of this join
size:

-  Nested Loop  (cost=0.00..6872092.36 rows=135 width=28) (actual 
time=94.762..14429291.129 rows=3554044 loops=1)
  -  Merge Join  (cost=0.00..519542.74 rows=449804 width=16) (actual 
time=48.197..49636.006 rows=474008 loops=1)
Merge Cond: (part.p_partkey = partsupp.ps_partkey)
-  Index Scan using pk_part on part  (cost=0.00..105830.22 
rows=112447 width=4) (actual time=34.646..14381.644 rows=118502 loops=1)
  Filter: ((p_name)::text ~~ '%ghost%'::text)
-  Index Scan using i_ps_partkey on partsupp  
(cost=0.00..388943.05 rows=8000278 width=12) (actual time=13.511..22659.364 
rows=7999685 loops=1)
  -  Index Scan using i_l_suppkey_partkey on lineitem  
(cost=0.00..14.11 rows=1 width=24) (actual time=4.415..30.310 rows=7 
loops=474008)
Index Cond: ((partsupp.ps_partkey = lineitem.l_partkey) AND 
(partsupp.ps_suppkey = lineitem.l_suppkey))

With a factor-of-25000 error in that rowcount estimate, it's amazing the
plans aren't worse than they are.

It evidently thinks that most of the rows in the join of part and
partsupp won't have any matching rows in lineitem, whereas on average
there are about 7 matching rows apiece.  So that's totally wacko, and
it's not immediately obvious why.  Could we see the pg_stats entries for
part.p_partkey, partsupp.ps_partkey, partsupp.ps_suppkey,
lineitem.l_partkey, lineitem.l_suppkey?

regards, tom lane

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] -HEAD planner issue wrt hash_joins on dbt3 ?

2006-09-22 Thread Stefan Kaltenbrunner
Tom Lane wrote:
 Matteo Beccati [EMAIL PROTECTED] writes:
 Tom Lane ha scritto:
 Matteo Beccati [EMAIL PROTECTED] writes:
 I cannot see anything bad by using something like that:
 if (histogram is large/representative enough)
 Well, the question is exactly what is large enough?  I feel a bit
 uncomfortable about applying the idea to a histogram with only 10
 entries (especially if we ignore two of 'em).  With 100 or more,
 it sounds all right.  What's the breakpoint?
 
 Yes, I think 100-200 could be a good breakpoint.
 
 I've committed this change with (for now) 100 as the minimum histogram
 size to use.  Stefan, are you interested in retrying your benchmark?

sure - but I'm having hardware (harddisk firmware) related issues on my
testbox which will take a few further days to be resolved ...


Stefan

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] -HEAD planner issue wrt hash_joins on dbt3 ?

2006-09-20 Thread Tom Lane
Matteo Beccati [EMAIL PROTECTED] writes:
 Tom Lane ha scritto:
 Matteo Beccati [EMAIL PROTECTED] writes:
 I cannot see anything bad by using something like that:
 if (histogram is large/representative enough)
 
 Well, the question is exactly what is large enough?  I feel a bit
 uncomfortable about applying the idea to a histogram with only 10
 entries (especially if we ignore two of 'em).  With 100 or more,
 it sounds all right.  What's the breakpoint?

 Yes, I think 100-200 could be a good breakpoint.

I've committed this change with (for now) 100 as the minimum histogram
size to use.  Stefan, are you interested in retrying your benchmark?

regards, tom lane

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] -HEAD planner issue wrt hash_joins on dbt3 ?

2006-09-20 Thread Matteo Beccati

Hi,

Tom Lane wrote:

I've committed this change with (for now) 100 as the minimum histogram
size to use.  Stefan, are you interested in retrying your benchmark?


A first try with ltree gave big improvements on my smaller data set: the 
estimated row count is correct or off by only 1 row. I'm now restoring a 
bigger database to get more reliable results.


I hope Stefan can confirm the improvement on dbt3 too.

Thanks Tom :)


Best regards
--
Matteo Beccati
http://phpadsnew.com
http://phppgads.com

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [HACKERS] -HEAD planner issue wrt hash_joins on dbt3 ?

2006-09-19 Thread Mark Cave-Ayland
On Mon, 2006-09-18 at 17:46 +0200, Matteo Beccati wrote:
 Tom Lane ha scritto:
  Matteo Beccati [EMAIL PROTECTED] writes:
  I cannot see anything bad by using something like that:
  if (histogram is large/representative enough)
  
  Well, the question is exactly what is large enough?  I feel a bit
  uncomfortable about applying the idea to a histogram with only 10
  entries (especially if we ignore two of 'em).  With 100 or more,
  it sounds all right.  What's the breakpoint?
 
 Yes, I think 100-200 could be a good breakpoint. I don't actually know 
 what is the current usage of SET STATISTICS, I usually set it to 1000 
 for columns which need more precise selectivity.
 
 The breakpoint could be set even higher (500?) so there is space to 
 increase statistics without enabling the histogram check, but I don't 
 feel very comfortable though suggesting this kind of possibly 
 undocumented side effect...


Hi everyone,

You may be interested to have a look at the statistics collector for the
geometry type within PostGIS. In order to prevent very large or very
small geometries from ruining the statistics histogram and generating
incorrect query plans, we make the assumption that the column
distribution is likely to be close to normal, and then remove any
ANALYZE-collected geometries from the set that lie outside +/- 3.25
standard deviations from the mean before creating the final histogram
(removes just under 1% of the data from each end of an assumed normal
distribution). This works well and AFAIK we've only ever had one
reported case of an incorrect query plan being generated using this
method.


Kind regards,

Mark.



---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] -HEAD planner issue wrt hash_joins on dbt3 ?

2006-09-19 Thread Stefan Kaltenbrunner
Tom Lane wrote:
 Stefan Kaltenbrunner [EMAIL PROTECTED] writes:
 Tom Lane wrote:
 I'm not sure what we could do about the concurrent-sessions issue, but
 we could make some sort of attack on the query complexity issue by
 pro-rating the effective_cache_size among all the tables used by a
 query.
 
 hmm not sure i understand what you mean here :-(
 
 Per the comment for index_pages_fetched:
 
  * We assume that effective_cache_size is the total number of buffer pages
  * available for both table and index, and pro-rate that space between the
  * table and index.  (Ideally other_pages should include all the other
  * tables and indexes used by the query too; but we don't have a good way
  * to get that number here.)
 
 A first-order approximation to this would be to add up the total sizes
 of all the other tables used in the query.  I am thinking of leaving out
 other indexes, mainly because we can't tell at this level which other
 indexes are actually gonna get used.  This would tend to underestimate
 by leaving out indexes, but not by a lot if you assume indexes are much
 smaller than their tables.  It would also be an overestimate because
 tables that are not indexscanned concurrently with the one under
 consideration probably shouldn't be counted anyway.  So one might hope
 these effects would more or less cancel out.  Anyway it seems to be a
 better idea than what we have now.


aah - I think I understand that logic now - thanks for the reference to
the source :-)

 
 I will redo with lower settings - do you have any suggestions for that ?
 
 Try reducing effective_cache_size to maybe a fourth of what it is now.
 If that helps the thing pick better plans for these multi-table queries,
 then we should try changing the other_pages calculation as above.

ok - the planner switches to a different plan at about 2.5GB of
effective_cache_size resulting in the following plan:

http://www.kaltenbrunner.cc/files/analyze_q7_1GB.txt (3 consecutive runs
- starting with cold caches)

with 6GB I get:

http://www.kaltenbrunner.cc/files/analyze_q7_6GB.txt (single run -
immediatly after the above ones)


Stefan

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] -HEAD planner issue wrt hash_joins on dbt3 ?

2006-09-19 Thread Tom Lane
Stefan Kaltenbrunner [EMAIL PROTECTED] writes:
 ok - the planner switches to a different plan at about 2.5GB of
 effective_cache_size resulting in the following [ much better ] plan:

OK, so it sounds like it'd be a good idea to try to pro-rate
effective_cache_size among all the tables in the query.  I'll see
if I can get that in without kluging the code up too much.

regards, tom lane

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] -HEAD planner issue wrt hash_joins on dbt3 ?

2006-09-18 Thread Matteo Beccati

Hi,

Tom Lane wrote:

Hmm ... pattern_sel already applies the operator directly to the
most_common_vals, but in this situation those aren't common enough
to help much.  With such an extensive histogram it is awfully tempting
to assume that the histogram members are a representative sample, and
take the selectivity as being the fraction of histogram entries that
match the pattern.  Maybe drop the first and last histogram entries
on the grounds they're probably outliers.  Thoughts?  What would be a
reasonable minimum histogram size to enable using this approach instead
of the guess-on-the-basis-of-the-pattern code?


That's what I was suggesting here respectively for ltree operators and like:

http://archives.postgresql.org/pgsql-patches/2006-05/msg00178.php
http://archives.postgresql.org/pgsql-performance/2006-01/msg00083.php

My original ltree patch was stripped of the histogram matching code and 
I will need to re-patch 8.2 when deploying it to get decent performance 
with a couple of queries, but it would be very nice to avoid it ;)


I cannot see anything bad by using something like that:

if (histogram is large/representative enough)
{
  recalculate_selectivity_matching_histogram_values()

  if (new_selectivity  old_selectivity)
return new_selectivity
  else
return old_selectivity
}


Best regards
--
Matteo Beccati
http://phpadsnew.com
http://phppgads.com

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] -HEAD planner issue wrt hash_joins on dbt3 ?

2006-09-18 Thread Tom Lane
Matteo Beccati [EMAIL PROTECTED] writes:
 I cannot see anything bad by using something like that:
 if (histogram is large/representative enough)

Well, the question is exactly what is large enough?  I feel a bit
uncomfortable about applying the idea to a histogram with only 10
entries (especially if we ignore two of 'em).  With 100 or more,
it sounds all right.  What's the breakpoint?

regards, tom lane

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] -HEAD planner issue wrt hash_joins on dbt3 ?

2006-09-18 Thread Matteo Beccati

Tom Lane ha scritto:

Matteo Beccati [EMAIL PROTECTED] writes:

I cannot see anything bad by using something like that:
if (histogram is large/representative enough)


Well, the question is exactly what is large enough?  I feel a bit
uncomfortable about applying the idea to a histogram with only 10
entries (especially if we ignore two of 'em).  With 100 or more,
it sounds all right.  What's the breakpoint?


Yes, I think 100-200 could be a good breakpoint. I don't actually know 
what is the current usage of SET STATISTICS, I usually set it to 1000 
for columns which need more precise selectivity.


The breakpoint could be set even higher (500?) so there is space to 
increase statistics without enabling the histogram check, but I don't 
feel very comfortable though suggesting this kind of possibly 
undocumented side effect...



Best ragards
--
Matteo Beccati
http://phpadsnew.com
http://phppgads.com

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [HACKERS] -HEAD planner issue wrt hash_joins on dbt3 ?

2006-09-17 Thread Stefan Kaltenbrunner
[already sent a variant of that yesterday but it doesn't look like it
made it to the list]

Tom Lane wrote:
 Stefan Kaltenbrunner [EMAIL PROTECTED] writes:
 Tom Lane wrote:
 Apparently we've made the planner a bit too optimistic about the savings
 that can be expected from repeated indexscans occurring on the inside of
 a join.
 
 effective_cache_size was set to 10GB(my fault for copying over the conf
 from a 16GB box) during the run - lowering it just a few megabytes(!) or
 to a more realistic 6GB results in the following MUCH better plan:
 http://www.kaltenbrunner.cc/files/dbt3_explain_analyze2.txt
 
 Interesting.  It used to be that effective_cache_size wasn't all that
 critical... what I think this report is showing is that with the 8.2
 changes to try to account for caching effects in repeated indexscans,
 we've turned that into a pretty significant parameter.

took me a while due to hardware issues on my testbox - but there are new
results(with 6GB for effective_cache_size) up at:

http://www.kaltenbrunner.cc/files/5/

there are still a few issues with the validity of the run like the rf
tests not actually being done right - but lowering effective_cache_size
gave a dramtic speedup on Q5,Q7 and Q8.

that is the explain for the 4h+ Q9:

http://www.kaltenbrunner.cc/files/analyze_q9.txt

increasing the the statistic_target up to 1000 does not seem to change
the plan btw.

disabling nested loop leads to the following (4 times faster) plan:

http://www.kaltenbrunner.cc/files/analyze_q9_no_nest.txt

since the hash-joins in there look rather slow (inappropriate hashtable
set up due to the wrong estimates?) I disabled hash_joins too:

http://www.kaltenbrunner.cc/files/analyze_q9_no_nest_no_hashjoin.txt

and amazingly this plan is by far the fastest one in runtime (15min vs
4,5h ...) except that the planner thinks it is 20 times more expensive ...


Stefan

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] -HEAD planner issue wrt hash_joins on dbt3 ?

2006-09-17 Thread Stefan Kaltenbrunner
Stefan Kaltenbrunner wrote:
 [already sent a variant of that yesterday but it doesn't look like it
 made it to the list]
 
 Tom Lane wrote:
 Stefan Kaltenbrunner [EMAIL PROTECTED] writes:
 Tom Lane wrote:
 Apparently we've made the planner a bit too optimistic about the savings
 that can be expected from repeated indexscans occurring on the inside of
 a join.
 effective_cache_size was set to 10GB(my fault for copying over the conf
 from a 16GB box) during the run - lowering it just a few megabytes(!) or
 to a more realistic 6GB results in the following MUCH better plan:
 http://www.kaltenbrunner.cc/files/dbt3_explain_analyze2.txt
 Interesting.  It used to be that effective_cache_size wasn't all that
 critical... what I think this report is showing is that with the 8.2
 changes to try to account for caching effects in repeated indexscans,
 we've turned that into a pretty significant parameter.
 
 took me a while due to hardware issues on my testbox - but there are new
 results(with 6GB for effective_cache_size) up at:
 
 http://www.kaltenbrunner.cc/files/5/
 
 there are still a few issues with the validity of the run like the rf
 tests not actually being done right - but lowering effective_cache_size
 gave a dramtic speedup on Q5,Q7 and Q8.
 
 that is the explain for the 4h+ Q9:
 
 http://www.kaltenbrunner.cc/files/analyze_q9.txt
 
 increasing the the statistic_target up to 1000 does not seem to change
 the plan btw.
 
 disabling nested loop leads to the following (4 times faster) plan:
 
 http://www.kaltenbrunner.cc/files/analyze_q9_no_nest.txt
 
 since the hash-joins in there look rather slow (inappropriate hashtable
 set up due to the wrong estimates?) I disabled hash_joins too:
 
 http://www.kaltenbrunner.cc/files/analyze_q9_no_nest_no_hashjoin.txt
 
 and amazingly this plan is by far the fastest one in runtime (15min vs
 4,5h ...) except that the planner thinks it is 20 times more expensive ...

some additional numbers(first one is with default settings, second is
with enable_nestloop = 'off', third one is with enable_nestloop = 'off'
and enable_hashjoin='off'):

http://www.kaltenbrunner.cc/files/analyze_q7.txt

here we have a 3x speedup with disabling nested loops and a 2x speedup
(over the original plan) with nested loops and hashjoins disabled.


http://www.kaltenbrunner.cc/files/analyze_q20.txt

here we have a 180x(!) speedup with both disabled planner options ...

it is worth mentioning that for both queries the estimated costs in
relation to each other looks quite reasonable as soon as enable_nestloop
= 'off' (ie 5042928 vs 10715247 with 344sec vs 514 for Q7 and 101441851
vs 101445468 with 10sec vs 11sec)


Stefan

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] -HEAD planner issue wrt hash_joins on dbt3 ?

2006-09-17 Thread Tom Lane
Stefan Kaltenbrunner [EMAIL PROTECTED] writes:
 that is the explain for the 4h+ Q9:
 http://www.kaltenbrunner.cc/files/analyze_q9.txt

The big problem there seems to be the drastic misestimation of the
number of rows matching the p_name ~~ '%ghost%' condition.  What does
pg_stats have for the p_name column?

regards, tom lane

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] -HEAD planner issue wrt hash_joins on dbt3 ?

2006-09-17 Thread Stefan Kaltenbrunner
Tom Lane wrote:
 Stefan Kaltenbrunner [EMAIL PROTECTED] writes:
 that is the explain for the 4h+ Q9:
 http://www.kaltenbrunner.cc/files/analyze_q9.txt
 
 The big problem there seems to be the drastic misestimation of the
 number of rows matching the p_name ~~ '%ghost%' condition.  What does
 pg_stats have for the p_name column?

http://www.kaltenbrunner.cc/files/pg_stat_p_name.txt


Stefan

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] -HEAD planner issue wrt hash_joins on dbt3 ?

2006-09-17 Thread Tom Lane
Stefan Kaltenbrunner [EMAIL PROTECTED] writes:
 some additional numbers(first one is with default settings, second is
 with enable_nestloop = 'off', third one is with enable_nestloop = 'off'
 and enable_hashjoin='off'):

 http://www.kaltenbrunner.cc/files/analyze_q7.txt

I'm inclined to think you still have effective_cache_size set too high;
or at least that the planner is being too optimistic about how much
cache space is actually available to each indexscan.

With the code as it currently stands, effective_cache_size has some of
the same properties as work_mem: the planner effectively assumes that
that much space is available to *each* indexscan, and so you'd need to
de-rate the setting based on the complexity of queries and the number of
concurrent sessions.

I'm not sure what we could do about the concurrent-sessions issue, but
we could make some sort of attack on the query complexity issue by
pro-rating the effective_cache_size among all the tables used by a
query.


 http://www.kaltenbrunner.cc/files/analyze_q20.txt
 here we have a 180x(!) speedup with both disabled planner options ...

There's something awfully bogus about that one --- how is it that the
aggregate subplan, with the exact same plan and same number of
executions in all three cases, has an actual runtime 200x more in the
first case?

regards, tom lane

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] -HEAD planner issue wrt hash_joins on dbt3 ?

2006-09-17 Thread Tom Lane
Stefan Kaltenbrunner [EMAIL PROTECTED] writes:
 Tom Lane wrote:
 The big problem there seems to be the drastic misestimation of the
 number of rows matching the p_name ~~ '%ghost%' condition.  What does
 pg_stats have for the p_name column?

 http://www.kaltenbrunner.cc/files/pg_stat_p_name.txt

Hmm ... pattern_sel already applies the operator directly to the
most_common_vals, but in this situation those aren't common enough
to help much.  With such an extensive histogram it is awfully tempting
to assume that the histogram members are a representative sample, and
take the selectivity as being the fraction of histogram entries that
match the pattern.  Maybe drop the first and last histogram entries
on the grounds they're probably outliers.  Thoughts?  What would be a
reasonable minimum histogram size to enable using this approach instead
of the guess-on-the-basis-of-the-pattern code?

regards, tom lane

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] -HEAD planner issue wrt hash_joins on dbt3 ?

2006-09-17 Thread Stefan Kaltenbrunner
Tom Lane wrote:
 Stefan Kaltenbrunner [EMAIL PROTECTED] writes:
 some additional numbers(first one is with default settings, second is
 with enable_nestloop = 'off', third one is with enable_nestloop = 'off'
 and enable_hashjoin='off'):
 
 http://www.kaltenbrunner.cc/files/analyze_q7.txt
 
 I'm inclined to think you still have effective_cache_size set too high;
 or at least that the planner is being too optimistic about how much
 cache space is actually available to each indexscan.

I have long term external monitoring on that server and it indeed shows
that that there was never less then about 5.8G of buffercache used (or
more then 2.2GB used by other means). So 6G might still be a bit on the
optimistic side but it is not actually that far of from reality.
I will redo with lower settings - do you have any suggestions for that ?

 
 With the code as it currently stands, effective_cache_size has some of
 the same properties as work_mem: the planner effectively assumes that
 that much space is available to *each* indexscan, and so you'd need to
 de-rate the setting based on the complexity of queries and the number of
 concurrent sessions.

concurrency is 1 here - there is never more than a single query running
in parallel in those tests.

 
 I'm not sure what we could do about the concurrent-sessions issue, but
 we could make some sort of attack on the query complexity issue by
 pro-rating the effective_cache_size among all the tables used by a
 query.

hmm not sure i understand what you mean here :-(

 
 
 http://www.kaltenbrunner.cc/files/analyze_q20.txt
 here we have a 180x(!) speedup with both disabled planner options ...
 
 There's something awfully bogus about that one --- how is it that the
 aggregate subplan, with the exact same plan and same number of
 executions in all three cases, has an actual runtime 200x more in the
 first case?

hmm - good question. I will redo those in a bit ...


Stefan

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] -HEAD planner issue wrt hash_joins on dbt3 ?

2006-09-17 Thread Tom Lane
Stefan Kaltenbrunner [EMAIL PROTECTED] writes:
 Tom Lane wrote:
 I'm not sure what we could do about the concurrent-sessions issue, but
 we could make some sort of attack on the query complexity issue by
 pro-rating the effective_cache_size among all the tables used by a
 query.

 hmm not sure i understand what you mean here :-(

Per the comment for index_pages_fetched:

 * We assume that effective_cache_size is the total number of buffer pages
 * available for both table and index, and pro-rate that space between the
 * table and index.  (Ideally other_pages should include all the other
 * tables and indexes used by the query too; but we don't have a good way
 * to get that number here.)

A first-order approximation to this would be to add up the total sizes
of all the other tables used in the query.  I am thinking of leaving out
other indexes, mainly because we can't tell at this level which other
indexes are actually gonna get used.  This would tend to underestimate
by leaving out indexes, but not by a lot if you assume indexes are much
smaller than their tables.  It would also be an overestimate because
tables that are not indexscanned concurrently with the one under
consideration probably shouldn't be counted anyway.  So one might hope
these effects would more or less cancel out.  Anyway it seems to be a
better idea than what we have now.

 I will redo with lower settings - do you have any suggestions for that ?

Try reducing effective_cache_size to maybe a fourth of what it is now.
If that helps the thing pick better plans for these multi-table queries,
then we should try changing the other_pages calculation as above.

regards, tom lane

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] -HEAD planner issue wrt hash_joins on dbt3 ?

2006-09-13 Thread Stefan Kaltenbrunner

Tom Lane wrote:

Stefan Kaltenbrunner [EMAIL PROTECTED] writes:

Tom Lane wrote:

Could we see the actual EXPLAIN ANALYZE results for the slow plan?



http://www.kaltenbrunner.cc/files/dbt3_explain_analyze.txt


Well, indeed it seems that the hash join is just an innocent bystander:
the bulk of the runtime (all but about 120 sec in fact) is spent here:

 -  Nested Loop  (cost=13.65..1719683.85 rows=12000672 
width=49) (actual time=60.325..24923860.713 rows=11897899 loops=1)
   -  Merge Join  (cost=0.00..10248.66 rows=2 
width=41) (actual time=16.654..2578.060 rows=19837 loops=1)
   ...
   -  Bitmap Heap Scan on lineitem  (cost=13.65..77.16 
rows=665 width=16) (actual time=13.492..1254.535 rows=600 loops=19837)
 Recheck Cond: (lineitem.l_suppkey = 
supplier.s_suppkey)
 -  Bitmap Index Scan on i_l_suppkey  
(cost=0.00..13.65 rows=665 width=0) (actual time=10.662..10.662 rows=600 
loops=19837)
   Index Cond: (lineitem.l_suppkey = 
supplier.s_suppkey)

I suppose that the profile result you showed was taken during the
startup transient where it was computing the hashtables that this loop's
results are joined to ... but that's not where the problem is.  The
problem is repeating that bitmap scan on lineitem for nearly 2
different l_suppkeys.


possible - I actually took them over a longer period of time



Apparently we've made the planner a bit too optimistic about the savings
that can be expected from repeated indexscans occurring on the inside of
a join.  The other plan uses a different join order and doesn't try to
join lineitem until it's got orders.o_orderkey, whereupon it does a
mergejoin against an indexscan on lineitem:

 -  Index Scan using i_l_orderkey on lineitem 
(cost=0.00..2715943.34 rows=60003360 width=16) (actual time=32.868..123668.380 
rows=59991868 loops=1)

The runtimes for the remainders of the plans are roughly comparable, so
it's the cost of joining lineitem that is hurting here.

Is lineitem sorted (or nearly sorted) by l_orderkey?  Part of the
problem could be overestimating the cost of this indexscan.

What are the physical sizes of lineitem and its indexes, and how do
those compare to your RAM?  What are you using for planner settings
(particularly effective_cache_size)?


ouch - you are right(as usual) here.
effective_cache_size was set to 10GB(my fault for copying over the conf
from a 16GB box) during the run - lowering it just a few megabytes(!) or
to a more realistic 6GB results in the following MUCH better plan:

http://www.kaltenbrunner.cc/files/dbt3_explain_analyze2.txt

as for the relation sizes:

dbt3=# select pg_relation_size('lineitem');
 pg_relation_size
--
  10832764928
(1 row)

dbt3=# select pg_total_relation_size('lineitem');
 pg_total_relation_size

22960259072
(1 row)

there are nine btree indexes on lineitem all between 1,1GB and 1,4GB in 
size.



Stefan

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [HACKERS] -HEAD planner issue wrt hash_joins on dbt3 ?

2006-09-13 Thread Tom Lane
Stefan Kaltenbrunner [EMAIL PROTECTED] writes:
 Tom Lane wrote:
 Apparently we've made the planner a bit too optimistic about the savings
 that can be expected from repeated indexscans occurring on the inside of
 a join.

 effective_cache_size was set to 10GB(my fault for copying over the conf
 from a 16GB box) during the run - lowering it just a few megabytes(!) or
 to a more realistic 6GB results in the following MUCH better plan:
 http://www.kaltenbrunner.cc/files/dbt3_explain_analyze2.txt

Interesting.  It used to be that effective_cache_size wasn't all that
critical... what I think this report is showing is that with the 8.2
changes to try to account for caching effects in repeated indexscans,
we've turned that into a pretty significant parameter.

It'd be nice not to have to depend on the DBA to give us a good number
for this setting.  But I don't know of any portable ways to find out
how much RAM is in the box, let alone what fraction of it we should
assume is available per-query.

regards, tom lane

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] -HEAD planner issue wrt hash_joins on dbt3 ?

2006-09-13 Thread Stefan Kaltenbrunner

Tom Lane wrote:

Stefan Kaltenbrunner [EMAIL PROTECTED] writes:

Tom Lane wrote:

Apparently we've made the planner a bit too optimistic about the savings
that can be expected from repeated indexscans occurring on the inside of
a join.



effective_cache_size was set to 10GB(my fault for copying over the conf
from a 16GB box) during the run - lowering it just a few megabytes(!) or
to a more realistic 6GB results in the following MUCH better plan:
http://www.kaltenbrunner.cc/files/dbt3_explain_analyze2.txt


Interesting.  It used to be that effective_cache_size wasn't all that
critical... what I think this report is showing is that with the 8.2
changes to try to account for caching effects in repeated indexscans,
we've turned that into a pretty significant parameter.


yes I'm a bit worried about that too - it has been a bit of 
conventional wisdom that setting effective_cache_size optimistic will 
never hurt and that it encourages postgresql to sometimes get a better 
plan by favouring index-scans.




It'd be nice not to have to depend on the DBA to give us a good number
for this setting.  But I don't know of any portable ways to find out
how much RAM is in the box, let alone what fraction of it we should
assume is available per-query.


well there are really a number of things the dba would better give 
accurate information to the database - though in that case we might go 
from too much won't hurt to too much will hurt ...



Stefan

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] -HEAD planner issue wrt hash_joins on dbt3 ?

2006-09-13 Thread David Fetter
On Wed, Sep 13, 2006 at 10:47:09AM -0400, Tom Lane wrote:
 Stefan Kaltenbrunner [EMAIL PROTECTED] writes:
  Tom Lane wrote:
  Apparently we've made the planner a bit too optimistic about the savings
  that can be expected from repeated indexscans occurring on the inside of
  a join.
 
  effective_cache_size was set to 10GB(my fault for copying over the conf
  from a 16GB box) during the run - lowering it just a few megabytes(!) or
  to a more realistic 6GB results in the following MUCH better plan:
  http://www.kaltenbrunner.cc/files/dbt3_explain_analyze2.txt
 
 Interesting.  It used to be that effective_cache_size wasn't all
 that critical... what I think this report is showing is that with
 the 8.2 changes to try to account for caching effects in repeated
 indexscans, we've turned that into a pretty significant parameter.
 
 It'd be nice not to have to depend on the DBA to give us a good
 number for this setting.  But I don't know of any portable ways to
 find out how much RAM is in the box, let alone what fraction of it
 we should assume is available per-query.

That's fairly straight-forward, if a little crude.  We ask the DBA and
provide some tools for estimating and tuning same. :)

Cheers,
D
-- 
David Fetter [EMAIL PROTECTED] http://fetter.org/
phone: +1 415 235 3778AIM: dfetter666
  Skype: davidfetter

Remember to vote!

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] -HEAD planner issue wrt hash_joins on dbt3 ?

2006-09-12 Thread Stefan Kaltenbrunner

Tom Lane wrote:

Stefan Kaltenbrunner [EMAIL PROTECTED] writes:

btw - the hashjoin is bad was more or less based on the observation
that nearly all of the cpu is burned in hash-related functions in the
profile (when profiling over a longer period of time those accumulate
even more % of the time than in the short profile I included in the
original report)


[ shrug... ]  Two out of the three functions you mentioned are not used
by hash join, and anyway the other plan probably has a comparable
execution density in sort-related functions; does that make it bad?


hmm sorry for that - I should have checked the source before I made that 
assumption :-(




It's possible that the large time for ExecScanHashBucket has something
to do with skewed usage of the hash buckets due to an unfortunate data
distribution, but that's theorizing far in advance of the data.


http://www.kaltenbrunner.cc/files/4/

has preliminary data of the dbt3/scaling 10 run I did which seems to 
imply we have at least 4 queries in there that take an excessive amount 
of time (query 5 is the one I started the complaint with).
However those results have to be taken with a graint of salt since there 
is an appearant bug in the dbt3 code which seems to rely on 
add_missing_from=on (as can be seen in some of the errorlogs of the 
database) and towards the end of the throughput run I did some of the 
explain analyzes for the report (those are the small 100% spikes in the 
graph due to the box using the second CPU to run them).

I will redo those tests later this week though ...

Stefan

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [HACKERS] -HEAD planner issue wrt hash_joins on dbt3 ?

2006-09-12 Thread Tom Lane
Stefan Kaltenbrunner [EMAIL PROTECTED] writes:
 Tom Lane wrote:
 Could we see the actual EXPLAIN ANALYZE results for the slow plan?

 http://www.kaltenbrunner.cc/files/dbt3_explain_analyze.txt

Well, indeed it seems that the hash join is just an innocent bystander:
the bulk of the runtime (all but about 120 sec in fact) is spent here:

 -  Nested Loop  (cost=13.65..1719683.85 rows=12000672 
width=49) (actual time=60.325..24923860.713 rows=11897899 loops=1)
   -  Merge Join  (cost=0.00..10248.66 rows=2 
width=41) (actual time=16.654..2578.060 rows=19837 loops=1)
   ...
   -  Bitmap Heap Scan on lineitem  (cost=13.65..77.16 
rows=665 width=16) (actual time=13.492..1254.535 rows=600 loops=19837)
 Recheck Cond: (lineitem.l_suppkey = 
supplier.s_suppkey)
 -  Bitmap Index Scan on i_l_suppkey  
(cost=0.00..13.65 rows=665 width=0) (actual time=10.662..10.662 rows=600 
loops=19837)
   Index Cond: (lineitem.l_suppkey = 
supplier.s_suppkey)

I suppose that the profile result you showed was taken during the
startup transient where it was computing the hashtables that this loop's
results are joined to ... but that's not where the problem is.  The
problem is repeating that bitmap scan on lineitem for nearly 2
different l_suppkeys.

Apparently we've made the planner a bit too optimistic about the savings
that can be expected from repeated indexscans occurring on the inside of
a join.  The other plan uses a different join order and doesn't try to
join lineitem until it's got orders.o_orderkey, whereupon it does a
mergejoin against an indexscan on lineitem:

 -  Index Scan using i_l_orderkey on lineitem 
(cost=0.00..2715943.34 rows=60003360 width=16) (actual time=32.868..123668.380 
rows=59991868 loops=1)

The runtimes for the remainders of the plans are roughly comparable, so
it's the cost of joining lineitem that is hurting here.

Is lineitem sorted (or nearly sorted) by l_orderkey?  Part of the
problem could be overestimating the cost of this indexscan.

What are the physical sizes of lineitem and its indexes, and how do
those compare to your RAM?  What are you using for planner settings
(particularly effective_cache_size)?

regards, tom lane

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] -HEAD planner issue wrt hash_joins on dbt3 ?

2006-09-11 Thread Tom Lane
Stefan Kaltenbrunner [EMAIL PROTECTED] writes:
 I'm about to do some benchmarking on -HEAD one some hardware I have
 available and it seems I'm hitting a rather weird issue causing the osdl
 dbt3 benchmark to run very slow and eating CPU time for hours ...

Could we see the actual EXPLAIN ANALYZE results for the slow plan?
I'm unconvinced by your hash join is bad analysis, especially in
the cases where you're giving it lots of work_mem.  I think it's got
something to do with the different join orders.  The rowcount estimates
in the fast plan all seem pretty good, but I'm betting something is
wrong with some of them in the slow case.

regards, tom lane

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] -HEAD planner issue wrt hash_joins on dbt3 ?

2006-09-11 Thread Stefan Kaltenbrunner
Tom Lane wrote:
 Stefan Kaltenbrunner [EMAIL PROTECTED] writes:
 I'm about to do some benchmarking on -HEAD one some hardware I have
 available and it seems I'm hitting a rather weird issue causing the osdl
 dbt3 benchmark to run very slow and eating CPU time for hours ...
 
 Could we see the actual EXPLAIN ANALYZE results for the slow plan?
 I'm unconvinced by your hash join is bad analysis, especially in
 the cases where you're giving it lots of work_mem.  I think it's got
 something to do with the different join orders.  The rowcount estimates
 in the fast plan all seem pretty good, but I'm betting something is
 wrong with some of them in the slow case.

will do - but that will take a while - the slow one runs for 12h or so
even without explain analyze overhead ...


Stefan

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] -HEAD planner issue wrt hash_joins on dbt3 ?

2006-09-11 Thread Tom Lane
Stefan Kaltenbrunner [EMAIL PROTECTED] writes:
 btw - the hashjoin is bad was more or less based on the observation
 that nearly all of the cpu is burned in hash-related functions in the
 profile (when profiling over a longer period of time those accumulate
 even more % of the time than in the short profile I included in the
 original report)

[ shrug... ]  Two out of the three functions you mentioned are not used
by hash join, and anyway the other plan probably has a comparable
execution density in sort-related functions; does that make it bad?

It's possible that the large time for ExecScanHashBucket has something
to do with skewed usage of the hash buckets due to an unfortunate data
distribution, but that's theorizing far in advance of the data.

regards, tom lane

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly