Re: [GENERAL] why sequential scan is used on indexed column ???
On Sat, Jun 14, 2008 at 02:35:38PM -0400, Tom Lane wrote: > Michael Fuhr <[EMAIL PROTECTED]> writes: > > I created a test case that has close to the same estimated and > > actual row counts and has the same plan if I disable enable_nestloop: > > There's something weird about this --- why does the second plan seqscan > b_saskaita, instead of using the bitmap scan that it had previously > estimated to be cheaper? Dunno. > What PG version are you testing, and can you provide the full test case? My test was in 8.2.9, the only version I had handy at the time. I later tested 8.1.13 (Julius said he was running 8.1.4) and got the same plan that Julius got without messing with planner settings. I don't have access to my test case right now but I'll post it when I get a chance. I simply populated the tables with random data, adjusting the amount and distribution until I got row count estimates close to what Julius got. I don't know if my test case is close enough to Julius's data to be relevant to his problem but if you think my results are weird then maybe I've stumbled across something else that's interesting. > (As for the original question, the hash plan seems to me to be perfectly > reasonable for the estimated row counts --- fetching one row out of > fifty using an indexscan is going to be expensive. So I think the OP's > problem is purely a statistical one, or maybe he's in a situation where > he should reduce random_page_cost.) Hmmm...8.1.13 wants to do the hash join that you think would be reasonable but 8.2.9 prefers the nested loop as in my second example. I think I did have a reduced random_page_cost (2 as I recall). -- Michael Fuhr -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] why sequential scan is used on indexed column ???
Hi Michael. Thank you for your answer. I've checked - enable_nestloop is true. I did ANALYZE, but that didn't help. The sequential scan is still used Any more ideas why? Julius Tuskenis Michael Fuhr rašė: On Sat, Jun 14, 2008 at 04:59:44PM +0200, Andreas Kretschmer wrote: Julius Tuskenis <[EMAIL PROTECTED]> schrieb: I have a question concerning performance. One of my queries take a long to execute. I tried to do "explain analyse" and I see that the sequential scan is being used, although I have indexes set on columns that are used in joins. The question is - WHY, and how to change that behavior??? Try to create an index on apsilankymai.sas_id In the DDL that Julius posted apsilankymai doesn't have an sas_id column. The join is on apsilankymai.aps_saskaita = b_saskaita.sas_id. Both columns have an index: b_saskaita.sas_id is a primary key so it should have an index implicitly, and apsilankymai.aps_saskaita has an explicit CREATE INDEX statement. The WHERE clause is on b_saskaita.sas_subjektas, which also has an explicit CREATE INDEX statement. Unless I'm mistaken all relevant columns have an index. A few of the row count estimates differ from reality: Hash Join (cost=5.17..10185.89 rows=6047 width=138) (actual time=10698.539..10698.539 rows=0 loops=1) Bitmap Heap Scan on b_saskaita (cost=2.03..5.14 rows=9 width=96) (actual time=31.473..31.489 rows=1 loops=1) However, that might not be entirely responsible for the questionable plan. I created a test case that has close to the same estimated and actual row counts and has the same plan if I disable enable_nestloop: set enable_nestloop to off; explain analyze select * FROM b_saskaita JOIN apsilankymai ON (aps_saskaita = sas_id) where sas_subjektas = 20190; QUERY PLAN -- Hash Join (cost=6.54..5814.42 rows=5406 width=286) (actual time=3222.429..3222.429 rows=0 loops=1) Hash Cond: (apsilankymai.aps_saskaita = b_saskaita.sas_id) -> Seq Scan on apsilankymai (cost=0.00..4627.50 rows=300350 width=42) (actual time=0.085..1514.863 rows=300350 loops=1) -> Hash (cost=6.43..6.43 rows=9 width=244) (actual time=0.122..0.122 rows=1 loops=1) -> Bitmap Heap Scan on b_saskaita (cost=2.32..6.43 rows=9 width=244) (actual time=0.089..0.095 rows=1 loops=1) Recheck Cond: (sas_subjektas = 20190) -> Bitmap Index Scan on fki_sas_subjektas (cost=0.00..2.32 rows=9 width=0) (actual time=0.066..0.066 rows=1 loops=1) Index Cond: (sas_subjektas = 20190) Total runtime: 3222.786 ms I get a better plan if I enable nested loops: set enable_nestloop to on; explain analyze select * FROM b_saskaita JOIN apsilankymai ON (aps_saskaita = sas_id) where sas_subjektas = 20190; QUERY PLAN --- Nested Loop (cost=79.93..4660.23 rows=5406 width=286) (actual time=1.000..1.000 rows=0 loops=1) -> Seq Scan on b_saskaita (cost=0.00..10.25 rows=9 width=244) (actual time=0.116..0.870 rows=1 loops=1) Filter: (sas_subjektas = 20190) -> Bitmap Heap Scan on apsilankymai (cost=79.93..441.58 rows=6007 width=42) (actual time=0.084..0.084 rows=0 loops=1) Recheck Cond: (apsilankymai.aps_saskaita = b_saskaita.sas_id) -> Bitmap Index Scan on idx_aps_saskaita (cost=0.00..78.43 rows=6007 width=0) (actual time=0.068..0.068 rows=0 loops=1) Index Cond: (apsilankymai.aps_saskaita = b_saskaita.sas_id) Total runtime: 1.321 ms Julius, do you perchance have enable_nestloop = off? If so, do you get a better plan if you enable it? Also, have you run ANALYZE lately? -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] why sequential scan is used on indexed column ???
Hello, Tom. So I think the OP's problem is purely a statistical one, or maybe he's in a situation where he should reduce random_page_cost.) What could be done solving that "statistical problem"? :) Current value for random_page_cost is 4. What value would you suggest? Julius Tuskenis -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] why sequential scan is used on indexed column ???
Michael Fuhr <[EMAIL PROTECTED]> schrieb: > On Sat, Jun 14, 2008 at 04:59:44PM +0200, Andreas Kretschmer wrote: > > Julius Tuskenis <[EMAIL PROTECTED]> schrieb: > > > I have a question concerning performance. One of my queries take a long > > > to execute. I tried to do "explain analyse" and I see that the > > > sequential scan is being used, although I have indexes set on columns > > > that are used in joins. The question is - WHY, and how to change that > > > behavior??? > > > > Try to create an index on apsilankymai.sas_id > > In the DDL that Julius posted apsilankymai doesn't have an sas_id > column. > > The join is on apsilankymai.aps_saskaita = b_saskaita.sas_id. Both > columns have an index: b_saskaita.sas_id is a primary key so it > should have an index implicitly, and apsilankymai.aps_saskaita has Right, my mistake. Andreas -- Really, I'm not out to destroy Microsoft. That will just be a completely unintentional side effect. (Linus Torvalds) "If I was god, I would recompile penguin with --enable-fly." (unknown) Kaufbach, Saxony, Germany, Europe. N 51.05082°, E 13.56889° -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] why sequential scan is used on indexed column ???
Michael Fuhr <[EMAIL PROTECTED]> writes: > I created a test case that has close to the same estimated and > actual row counts and has the same plan if I disable enable_nestloop: There's something weird about this --- why does the second plan seqscan b_saskaita, instead of using the bitmap scan that it had previously estimated to be cheaper? What PG version are you testing, and can you provide the full test case? (As for the original question, the hash plan seems to me to be perfectly reasonable for the estimated row counts --- fetching one row out of fifty using an indexscan is going to be expensive. So I think the OP's problem is purely a statistical one, or maybe he's in a situation where he should reduce random_page_cost.) regards, tom lane -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] why sequential scan is used on indexed column ???
On Sat, Jun 14, 2008 at 04:59:44PM +0200, Andreas Kretschmer wrote: > Julius Tuskenis <[EMAIL PROTECTED]> schrieb: > > I have a question concerning performance. One of my queries take a long > > to execute. I tried to do "explain analyse" and I see that the > > sequential scan is being used, although I have indexes set on columns > > that are used in joins. The question is - WHY, and how to change that > > behavior??? > > Try to create an index on apsilankymai.sas_id In the DDL that Julius posted apsilankymai doesn't have an sas_id column. The join is on apsilankymai.aps_saskaita = b_saskaita.sas_id. Both columns have an index: b_saskaita.sas_id is a primary key so it should have an index implicitly, and apsilankymai.aps_saskaita has an explicit CREATE INDEX statement. The WHERE clause is on b_saskaita.sas_subjektas, which also has an explicit CREATE INDEX statement. Unless I'm mistaken all relevant columns have an index. A few of the row count estimates differ from reality: > Hash Join (cost=5.17..10185.89 rows=6047 width=138) (actual > time=10698.539..10698.539 rows=0 loops=1) > Bitmap Heap Scan on b_saskaita (cost=2.03..5.14 rows=9 width=96) (actual > time=31.473..31.489 rows=1 loops=1) However, that might not be entirely responsible for the questionable plan. I created a test case that has close to the same estimated and actual row counts and has the same plan if I disable enable_nestloop: set enable_nestloop to off; explain analyze select * FROM b_saskaita JOIN apsilankymai ON (aps_saskaita = sas_id) where sas_subjektas = 20190; QUERY PLAN -- Hash Join (cost=6.54..5814.42 rows=5406 width=286) (actual time=3222.429..3222.429 rows=0 loops=1) Hash Cond: (apsilankymai.aps_saskaita = b_saskaita.sas_id) -> Seq Scan on apsilankymai (cost=0.00..4627.50 rows=300350 width=42) (actual time=0.085..1514.863 rows=300350 loops=1) -> Hash (cost=6.43..6.43 rows=9 width=244) (actual time=0.122..0.122 rows=1 loops=1) -> Bitmap Heap Scan on b_saskaita (cost=2.32..6.43 rows=9 width=244) (actual time=0.089..0.095 rows=1 loops=1) Recheck Cond: (sas_subjektas = 20190) -> Bitmap Index Scan on fki_sas_subjektas (cost=0.00..2.32 rows=9 width=0) (actual time=0.066..0.066 rows=1 loops=1) Index Cond: (sas_subjektas = 20190) Total runtime: 3222.786 ms I get a better plan if I enable nested loops: set enable_nestloop to on; explain analyze select * FROM b_saskaita JOIN apsilankymai ON (aps_saskaita = sas_id) where sas_subjektas = 20190; QUERY PLAN --- Nested Loop (cost=79.93..4660.23 rows=5406 width=286) (actual time=1.000..1.000 rows=0 loops=1) -> Seq Scan on b_saskaita (cost=0.00..10.25 rows=9 width=244) (actual time=0.116..0.870 rows=1 loops=1) Filter: (sas_subjektas = 20190) -> Bitmap Heap Scan on apsilankymai (cost=79.93..441.58 rows=6007 width=42) (actual time=0.084..0.084 rows=0 loops=1) Recheck Cond: (apsilankymai.aps_saskaita = b_saskaita.sas_id) -> Bitmap Index Scan on idx_aps_saskaita (cost=0.00..78.43 rows=6007 width=0) (actual time=0.068..0.068 rows=0 loops=1) Index Cond: (apsilankymai.aps_saskaita = b_saskaita.sas_id) Total runtime: 1.321 ms Julius, do you perchance have enable_nestloop = off? If so, do you get a better plan if you enable it? Also, have you run ANALYZE lately? -- Michael Fuhr -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] why sequential scan is used on indexed column ???
Julius Tuskenis <[EMAIL PROTECTED]> schrieb: > Hello. > > I have a question concerning performance. One of my queries take a long > to execute. I tried to do "explain analyse" and I see that the > sequential scan is being used, although I have indexes set on columns > that are used in joins. The question is - WHY, and how to change that > behavior??? Try to create an index on apsilankymai.sas_id Andreas -- Really, I'm not out to destroy Microsoft. That will just be a completely unintentional side effect. (Linus Torvalds) "If I was god, I would recompile penguin with --enable-fly." (unknown) Kaufbach, Saxony, Germany, Europe. N 51.05082°, E 13.56889° -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general