Re: [GENERAL] why sequential scan is used on indexed column ???

2008-06-16 Thread Michael Fuhr
On Sat, Jun 14, 2008 at 02:35:38PM -0400, Tom Lane wrote:
> Michael Fuhr <[EMAIL PROTECTED]> writes:
> > I created a test case that has close to the same estimated and
> > actual row counts and has the same plan if I disable enable_nestloop:
> 
> There's something weird about this --- why does the second plan seqscan
> b_saskaita, instead of using the bitmap scan that it had previously
> estimated to be cheaper?

Dunno.

> What PG version are you testing, and can you provide the full test case?

My test was in 8.2.9, the only version I had handy at the time.  I
later tested 8.1.13 (Julius said he was running 8.1.4) and got the
same plan that Julius got without messing with planner settings.

I don't have access to my test case right now but I'll post it when
I get a chance.  I simply populated the tables with random data,
adjusting the amount and distribution until I got row count estimates
close to what Julius got.  I don't know if my test case is close
enough to Julius's data to be relevant to his problem but if you think
my results are weird then maybe I've stumbled across something else
that's interesting.

> (As for the original question, the hash plan seems to me to be perfectly
> reasonable for the estimated row counts --- fetching one row out of
> fifty using an indexscan is going to be expensive.  So I think the OP's
> problem is purely a statistical one, or maybe he's in a situation where
> he should reduce random_page_cost.)

Hmmm...8.1.13 wants to do the hash join that you think would be
reasonable but 8.2.9 prefers the nested loop as in my second example.
I think I did have a reduced random_page_cost (2 as I recall).

-- 
Michael Fuhr

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] why sequential scan is used on indexed column ???

2008-06-15 Thread Julius Tuskenis

Hi Michael.

Thank you for your answer. I've checked - enable_nestloop is true. I did 
ANALYZE, but that didn't help. The sequential scan is still used Any 
more ideas why?


Julius Tuskenis



Michael Fuhr rašė:

On Sat, Jun 14, 2008 at 04:59:44PM +0200, Andreas Kretschmer wrote:
  

Julius Tuskenis <[EMAIL PROTECTED]> schrieb:

I have a question concerning performance. One of my queries take a long  
to execute. I tried to do "explain analyse" and I see that the  
sequential scan is being used, although I have indexes set on columns  
that are used in joins. The question is - WHY, and how to change that  
behavior???
  

Try to create an index on apsilankymai.sas_id



In the DDL that Julius posted apsilankymai doesn't have an sas_id
column.

The join is on apsilankymai.aps_saskaita = b_saskaita.sas_id.  Both
columns have an index: b_saskaita.sas_id is a primary key so it
should have an index implicitly, and apsilankymai.aps_saskaita has
an explicit CREATE INDEX statement.  The WHERE clause is on
b_saskaita.sas_subjektas, which also has an explicit CREATE INDEX
statement.  Unless I'm mistaken all relevant columns have an index.

A few of the row count estimates differ from reality:

  

Hash Join  (cost=5.17..10185.89 rows=6047 width=138) (actual 
time=10698.539..10698.539 rows=0 loops=1)



  

Bitmap Heap Scan on b_saskaita  (cost=2.03..5.14 rows=9 width=96) (actual 
time=31.473..31.489 rows=1 loops=1)



However, that might not be entirely responsible for the questionable
plan.  I created a test case that has close to the same estimated and
actual row counts and has the same plan if I disable enable_nestloop:

set enable_nestloop to off;

explain analyze
select *
FROM b_saskaita JOIN apsilankymai ON (aps_saskaita = sas_id)
where sas_subjektas = 20190;

  QUERY PLAN  
--

 Hash Join  (cost=6.54..5814.42 rows=5406 width=286) (actual 
time=3222.429..3222.429 rows=0 loops=1)
   Hash Cond: (apsilankymai.aps_saskaita = b_saskaita.sas_id)
   ->  Seq Scan on apsilankymai  (cost=0.00..4627.50 rows=300350 width=42) 
(actual time=0.085..1514.863 rows=300350 loops=1)
   ->  Hash  (cost=6.43..6.43 rows=9 width=244) (actual time=0.122..0.122 
rows=1 loops=1)
 ->  Bitmap Heap Scan on b_saskaita  (cost=2.32..6.43 rows=9 width=244) 
(actual time=0.089..0.095 rows=1 loops=1)
   Recheck Cond: (sas_subjektas = 20190)
   ->  Bitmap Index Scan on fki_sas_subjektas  (cost=0.00..2.32 
rows=9 width=0) (actual time=0.066..0.066 rows=1 loops=1)
 Index Cond: (sas_subjektas = 20190)
 Total runtime: 3222.786 ms

I get a better plan if I enable nested loops:

set enable_nestloop to on;

explain analyze
select *
FROM b_saskaita JOIN apsilankymai ON (aps_saskaita = sas_id)
where sas_subjektas = 20190;

QUERY PLAN 
---

 Nested Loop  (cost=79.93..4660.23 rows=5406 width=286) (actual 
time=1.000..1.000 rows=0 loops=1)
   ->  Seq Scan on b_saskaita  (cost=0.00..10.25 rows=9 width=244) (actual 
time=0.116..0.870 rows=1 loops=1)
 Filter: (sas_subjektas = 20190)
   ->  Bitmap Heap Scan on apsilankymai  (cost=79.93..441.58 rows=6007 
width=42) (actual time=0.084..0.084 rows=0 loops=1)
 Recheck Cond: (apsilankymai.aps_saskaita = b_saskaita.sas_id)
 ->  Bitmap Index Scan on idx_aps_saskaita  (cost=0.00..78.43 rows=6007 
width=0) (actual time=0.068..0.068 rows=0 loops=1)
   Index Cond: (apsilankymai.aps_saskaita = b_saskaita.sas_id)
 Total runtime: 1.321 ms

Julius, do you perchance have enable_nestloop = off?  If so, do you
get a better plan if you enable it?  Also, have you run ANALYZE
lately?

  


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] why sequential scan is used on indexed column ???

2008-06-15 Thread Julius Tuskenis

Hello, Tom.


  So I think the OP's
problem is purely a statistical one, or maybe he's in a situation where
he should reduce random_page_cost.)
  
What could be done solving that "statistical problem"? :)  Current value 
for random_page_cost is 4. What value would you suggest?


Julius Tuskenis

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] why sequential scan is used on indexed column ???

2008-06-15 Thread Andreas Kretschmer
Michael Fuhr <[EMAIL PROTECTED]> schrieb:

> On Sat, Jun 14, 2008 at 04:59:44PM +0200, Andreas Kretschmer wrote:
> > Julius Tuskenis <[EMAIL PROTECTED]> schrieb:
> > > I have a question concerning performance. One of my queries take a long  
> > > to execute. I tried to do "explain analyse" and I see that the  
> > > sequential scan is being used, although I have indexes set on columns  
> > > that are used in joins. The question is - WHY, and how to change that  
> > > behavior???
> > 
> > Try to create an index on apsilankymai.sas_id
> 
> In the DDL that Julius posted apsilankymai doesn't have an sas_id
> column.
> 
> The join is on apsilankymai.aps_saskaita = b_saskaita.sas_id.  Both
> columns have an index: b_saskaita.sas_id is a primary key so it
> should have an index implicitly, and apsilankymai.aps_saskaita has

Right, my mistake.


Andreas
-- 
Really, I'm not out to destroy Microsoft. That will just be a completely
unintentional side effect.  (Linus Torvalds)
"If I was god, I would recompile penguin with --enable-fly."   (unknown)
Kaufbach, Saxony, Germany, Europe.  N 51.05082°, E 13.56889°

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] why sequential scan is used on indexed column ???

2008-06-14 Thread Tom Lane
Michael Fuhr <[EMAIL PROTECTED]> writes:
> I created a test case that has close to the same estimated and
> actual row counts and has the same plan if I disable enable_nestloop:

There's something weird about this --- why does the second plan seqscan
b_saskaita, instead of using the bitmap scan that it had previously
estimated to be cheaper?  What PG version are you testing, and can you
provide the full test case?

(As for the original question, the hash plan seems to me to be perfectly
reasonable for the estimated row counts --- fetching one row out of
fifty using an indexscan is going to be expensive.  So I think the OP's
problem is purely a statistical one, or maybe he's in a situation where
he should reduce random_page_cost.)

regards, tom lane

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] why sequential scan is used on indexed column ???

2008-06-14 Thread Michael Fuhr
On Sat, Jun 14, 2008 at 04:59:44PM +0200, Andreas Kretschmer wrote:
> Julius Tuskenis <[EMAIL PROTECTED]> schrieb:
> > I have a question concerning performance. One of my queries take a long  
> > to execute. I tried to do "explain analyse" and I see that the  
> > sequential scan is being used, although I have indexes set on columns  
> > that are used in joins. The question is - WHY, and how to change that  
> > behavior???
> 
> Try to create an index on apsilankymai.sas_id

In the DDL that Julius posted apsilankymai doesn't have an sas_id
column.

The join is on apsilankymai.aps_saskaita = b_saskaita.sas_id.  Both
columns have an index: b_saskaita.sas_id is a primary key so it
should have an index implicitly, and apsilankymai.aps_saskaita has
an explicit CREATE INDEX statement.  The WHERE clause is on
b_saskaita.sas_subjektas, which also has an explicit CREATE INDEX
statement.  Unless I'm mistaken all relevant columns have an index.

A few of the row count estimates differ from reality:

> Hash Join  (cost=5.17..10185.89 rows=6047 width=138) (actual 
> time=10698.539..10698.539 rows=0 loops=1)

> Bitmap Heap Scan on b_saskaita  (cost=2.03..5.14 rows=9 width=96) (actual 
> time=31.473..31.489 rows=1 loops=1)

However, that might not be entirely responsible for the questionable
plan.  I created a test case that has close to the same estimated and
actual row counts and has the same plan if I disable enable_nestloop:

set enable_nestloop to off;

explain analyze
select *
FROM b_saskaita JOIN apsilankymai ON (aps_saskaita = sas_id)
where sas_subjektas = 20190;

  QUERY PLAN
  
--
 Hash Join  (cost=6.54..5814.42 rows=5406 width=286) (actual 
time=3222.429..3222.429 rows=0 loops=1)
   Hash Cond: (apsilankymai.aps_saskaita = b_saskaita.sas_id)
   ->  Seq Scan on apsilankymai  (cost=0.00..4627.50 rows=300350 width=42) 
(actual time=0.085..1514.863 rows=300350 loops=1)
   ->  Hash  (cost=6.43..6.43 rows=9 width=244) (actual time=0.122..0.122 
rows=1 loops=1)
 ->  Bitmap Heap Scan on b_saskaita  (cost=2.32..6.43 rows=9 width=244) 
(actual time=0.089..0.095 rows=1 loops=1)
   Recheck Cond: (sas_subjektas = 20190)
   ->  Bitmap Index Scan on fki_sas_subjektas  (cost=0.00..2.32 
rows=9 width=0) (actual time=0.066..0.066 rows=1 loops=1)
 Index Cond: (sas_subjektas = 20190)
 Total runtime: 3222.786 ms

I get a better plan if I enable nested loops:

set enable_nestloop to on;

explain analyze
select *
FROM b_saskaita JOIN apsilankymai ON (aps_saskaita = sas_id)
where sas_subjektas = 20190;

QUERY PLAN  
   
---
 Nested Loop  (cost=79.93..4660.23 rows=5406 width=286) (actual 
time=1.000..1.000 rows=0 loops=1)
   ->  Seq Scan on b_saskaita  (cost=0.00..10.25 rows=9 width=244) (actual 
time=0.116..0.870 rows=1 loops=1)
 Filter: (sas_subjektas = 20190)
   ->  Bitmap Heap Scan on apsilankymai  (cost=79.93..441.58 rows=6007 
width=42) (actual time=0.084..0.084 rows=0 loops=1)
 Recheck Cond: (apsilankymai.aps_saskaita = b_saskaita.sas_id)
 ->  Bitmap Index Scan on idx_aps_saskaita  (cost=0.00..78.43 rows=6007 
width=0) (actual time=0.068..0.068 rows=0 loops=1)
   Index Cond: (apsilankymai.aps_saskaita = b_saskaita.sas_id)
 Total runtime: 1.321 ms

Julius, do you perchance have enable_nestloop = off?  If so, do you
get a better plan if you enable it?  Also, have you run ANALYZE
lately?

-- 
Michael Fuhr

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] why sequential scan is used on indexed column ???

2008-06-14 Thread Andreas Kretschmer
Julius Tuskenis <[EMAIL PROTECTED]> schrieb:

> Hello.
>
> I have a question concerning performance. One of my queries take a long  
> to execute. I tried to do "explain analyse" and I see that the  
> sequential scan is being used, although I have indexes set on columns  
> that are used in joins. The question is - WHY, and how to change that  
> behavior???

Try to create an index on apsilankymai.sas_id



Andreas
-- 
Really, I'm not out to destroy Microsoft. That will just be a completely
unintentional side effect.  (Linus Torvalds)
"If I was god, I would recompile penguin with --enable-fly."   (unknown)
Kaufbach, Saxony, Germany, Europe.  N 51.05082°, E 13.56889°

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general