Re: [GENERAL] Postgres 8.3 only uses seq scan
On 11/26/2008 06:44 PM, [EMAIL PROTECTED] wrote: > Try running EXPLAIN ANALYZE - that gives much more information. For > example it may show differences in number of rows between the two > machines, that the statistics are not up to date, etc. Thanks a lot for this tip -- [ Clemens Schwaighofer -=:~ ] [ IT Engineer/Manager] [ E-Graphics Communications, TEQUILA\ Japan IT Group ] [6-17-2 Ginza Chuo-ku, Tokyo 104-8167, JAPAN ] [ Tel: +81-(0)3-3545-7706Fax: +81-(0)3-3545-7343 ] [ http://www.tequila.jp ] signature.asc Description: OpenPGP digital signature
Re: [GENERAL] Postgres 8.3 only uses seq scan
Try running EXPLAIN ANALYZE - that gives much more information. For example it may show differences in number of rows between the two machines, that the statistics are not up to date, etc. regards Tomas > Hi, > > I have system here with Debian/Testing and the latest 8.2 and 8.3 > database installed. > > on a blank database I create two very simple tables > > Table "public.foo" > Column | Type| Modifiers > +---+-- > foo_id | integer | not null default > nextval('foo_foo_id_seq'::regclass) > test | character varying | > Indexes: > "foo_pkey" PRIMARY KEY, btree (foo_id) > > > Table "public.bar" > Column | Type| Modifiers > +---+-- > bar_id | integer | not null default > nextval('bar_bar_id_seq'::regclass) > foo_id | integer | not null > test | character varying | > Indexes: > "bar_pkey" PRIMARY KEY, btree (bar_id) > "bar_foo_id_idx" btree (foo_id) > Foreign-key constraints: > "bar_foo_id_fkey" FOREIGN KEY (foo_id) REFERENCES foo(foo_id) MATCH > FULL ON UPDATE CASCADE ON DELETE CASCADE > > now if I run a simple join query over both tables Postgres 8.2 gives > this back for the explain: > > # explain select * from foo f, bar b where f.foo_id = b.foo_id; > QUERY PLAN > > Nested Loop (cost=0.00..33.14 rows=3 width=76) >-> Index Scan using bar_foo_id_idx on bar b (cost=0.00..12.30 > rows=3 width=40) >-> Index Scan using foo_pkey on foo f (cost=0.00..6.93 rows=1 > width=36) > Index Cond: (f.foo_id = b.foo_id) > > > but on the 8.3 version i get this back > > # explain select * from foo f, bar b where f.foo_id = b.foo_id; > QUERY PLAN > -- > Hash Join (cost=1.07..2.14 rows=3 width=24) >Hash Cond: (b.foo_id = f.foo_id) >-> Seq Scan on bar b (cost=0.00..1.03 rows=3 width=14) >-> Hash (cost=1.03..1.03 rows=3 width=10) > -> Seq Scan on foo f (cost=0.00..1.03 rows=3 width=10) > > once I insert a million rows he does use the index: > > # explain select * from foo f, bar b where f.foo_id = b.foo_id; > QUERY PLAN > --- > Nested Loop (cost=0.00..26.39 rows=9 width=35) >-> Seq Scan on foo f (cost=0.00..1.03 rows=3 width=21) >-> Index Scan using bar_foo_id_idx on bar b (cost=0.00..8.42 rows=3 > width=14) > Index Cond: (b.foo_id = f.foo_id) > > > I have seen this behavior on all of my postgres 8.3 installs. The > indexes are there, auto vacuum is turned on. even a reindex of the > tables does not help. The configuration files are identical in grounds > of memory usage, query planning, etc. > > I see this on RPM packages for RedHat Enterprise, self compiled for > FreeBSD 4, and debian packages. I am seriously very very confused. > > What can I do to debug this further, or find out why this happens? > > Does this mean Postgres 8.3 thinks a sequence scan is faster than an > index scan? Even on tables with hundred thousands rows? > > -- > [ Clemens Schwaighofer -=:~ ] > [ IT Engineer/Manager] > [ E-Graphics Communications, TEQUILA\ Japan IT Group ] > [6-17-2 Ginza Chuo-ku, Tokyo 104-8167, JAPAN ] > [ Tel: +81-(0)3-3545-7706Fax: +81-(0)3-3545-7343 ] > [ http://www.tequila.jp ] > > -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Postgres 8.3 only uses seq scan
On 11/26/2008 03:20 PM, Scott Marlowe wrote: > On Tue, Nov 25, 2008 at 10:22 PM, Clemens Schwaighofer > <[EMAIL PROTECTED]> wrote: >> On 11/26/2008 02:15 PM, Scott Marlowe wrote: >>> Are there more rows in the 8.2 table you're testing on? Or is the >>> whole table small enough to fit on a few pages? >> I highly doubt that. I have right now in one of the DBs I transfered >> tables from ~100.000 down to ~40.000 rows that all join together. I >> somehow really doubt that fit in a few pages. > > Right, with more rows, and choosing fewer, pgsql will go for an index > scan. if choosing a good %, the seq scan. okay, then I think I get it. One of my test queries was actually selecting a very big chunk (90%) of the data, so it makes sense the planner chooses seq scan over index scan here. >> So, I am fine when I trust the Postgresql planner :) Because speed wise >> I see no difference that 8.3 would be slower than 8.2 > > Well, the planner's not perfect. Some off corner cases can catch it > out, or if your database isn't analyzed after a lot of changes it may > make an uninformed decision. But most the time it makes the right, or > close enough, decision. Well, I have autovacuum turned on, so this should hopefully keep the planner up to date. Anyway, thanks a lot for your help. -- [ Clemens Schwaighofer -=:~ ] [ IT Engineer/Manager] [ E-Graphics Communications, TEQUILA\ Japan IT Group ] [6-17-2 Ginza Chuo-ku, Tokyo 104-8167, JAPAN ] [ Tel: +81-(0)3-3545-7706Fax: +81-(0)3-3545-7343 ] [ http://www.tequila.jp ] signature.asc Description: OpenPGP digital signature
Re: [GENERAL] Postgres 8.3 only uses seq scan
On Tue, Nov 25, 2008 at 10:22 PM, Clemens Schwaighofer <[EMAIL PROTECTED]> wrote: > On 11/26/2008 02:15 PM, Scott Marlowe wrote: >> >> Are there more rows in the 8.2 table you're testing on? Or is the >> whole table small enough to fit on a few pages? > > I highly doubt that. I have right now in one of the DBs I transfered > tables from ~100.000 down to ~40.000 rows that all join together. I > somehow really doubt that fit in a few pages. Right, with more rows, and choosing fewer, pgsql will go for an index scan. if choosing a good %, the seq scan. > So, I am fine when I trust the Postgresql planner :) Because speed wise > I see no difference that 8.3 would be slower than 8.2 Well, the planner's not perfect. Some off corner cases can catch it out, or if your database isn't analyzed after a lot of changes it may make an uninformed decision. But most the time it makes the right, or close enough, decision. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Postgres 8.3 only uses seq scan
On 11/26/2008 02:15 PM, Scott Marlowe wrote: > On Tue, Nov 25, 2008 at 10:07 PM, Clemens Schwaighofer > <[EMAIL PROTECTED]> wrote: >> On 11/26/2008 02:04 PM, Scott Marlowe wrote: >>> On Tue, Nov 25, 2008 at 8:39 PM, Clemens Schwaighofer >>> <[EMAIL PROTECTED]> wrote: but on the 8.3 version i get this back # explain select * from foo f, bar b where f.foo_id = b.foo_id; QUERY PLAN -- Hash Join (cost=1.07..2.14 rows=3 width=24) Hash Cond: (b.foo_id = f.foo_id) -> Seq Scan on bar b (cost=0.00..1.03 rows=3 width=14) -> Hash (cost=1.03..1.03 rows=3 width=10) -> Seq Scan on foo f (cost=0.00..1.03 rows=3 width=10) >>> Of course it uses a seq scan. All the data fits handily into a single >>> page I assume. >> okay, the strange thing is, that in 8.2 it always used an index scan. > > Are there more rows in the 8.2 table you're testing on? Or is the > whole table small enough to fit on a few pages? I highly doubt that. I have right now in one of the DBs I transfered tables from ~100.000 down to ~40.000 rows that all join together. I somehow really doubt that fit in a few pages. That is why I was so surprised to see such a big difference in the explain. > once I insert a million rows he does use the index: # explain select * from foo f, bar b where f.foo_id = b.foo_id; QUERY PLAN --- Nested Loop (cost=0.00..26.39 rows=9 width=35) -> Seq Scan on foo f (cost=0.00..1.03 rows=3 width=21) -> Index Scan using bar_foo_id_idx on bar b (cost=0.00..8.42 rows=3 width=14) Index Cond: (b.foo_id = f.foo_id) >>> I don't see a million rows here, only three. Have you run analyze >>> after loading all that data? Or is it retrieving 3 rows out of a >>> million? If so then an index scan does make sense. >> yeah, there are 3 matching rows, and the rest is just data to make the >> table big. >> >> I am just still confused, because if Postgres does only use seq scan >> even in very large databases, I am worried I do something very wrong in >> my DB design ... > > Postgresql has no visibility in its indexes, meaning that whether it > uses an index or not, it still has to go to the table to see if the > tuple is actually visible to this transaction. For this reason, > PostgreSQL switches to sequential scans quicker than other dbs that > have visibility information in their indexes. > > The planner is pretty smart, but if you're going to hit a large % of > the table anyway, it switches to sequential scans since it will have > to retreive the majority of the table anyway. So, I am fine when I trust the Postgresql planner :) Because speed wise I see no difference that 8.3 would be slower than 8.2 -- [ Clemens Schwaighofer -=:~ ] [ IT Engineer/Manager] [ E-Graphics Communications, TEQUILA\ Japan IT Group ] [6-17-2 Ginza Chuo-ku, Tokyo 104-8167, JAPAN ] [ Tel: +81-(0)3-3545-7706Fax: +81-(0)3-3545-7343 ] [ http://www.tequila.jp ] signature.asc Description: OpenPGP digital signature
Re: [GENERAL] Postgres 8.3 only uses seq scan
On Tue, Nov 25, 2008 at 10:07 PM, Clemens Schwaighofer <[EMAIL PROTECTED]> wrote: > On 11/26/2008 02:04 PM, Scott Marlowe wrote: >> On Tue, Nov 25, 2008 at 8:39 PM, Clemens Schwaighofer >> <[EMAIL PROTECTED]> wrote: >>> but on the 8.3 version i get this back >>> >>> # explain select * from foo f, bar b where f.foo_id = b.foo_id; >>>QUERY PLAN >>> -- >>> Hash Join (cost=1.07..2.14 rows=3 width=24) >>> Hash Cond: (b.foo_id = f.foo_id) >>> -> Seq Scan on bar b (cost=0.00..1.03 rows=3 width=14) >>> -> Hash (cost=1.03..1.03 rows=3 width=10) >>> -> Seq Scan on foo f (cost=0.00..1.03 rows=3 width=10) >> >> Of course it uses a seq scan. All the data fits handily into a single >> page I assume. > > okay, the strange thing is, that in 8.2 it always used an index scan. Are there more rows in the 8.2 table you're testing on? Or is the whole table small enough to fit on a few pages? >>> once I insert a million rows he does use the index: >>> >>> # explain select * from foo f, bar b where f.foo_id = b.foo_id; >>>QUERY PLAN >>> --- >>> Nested Loop (cost=0.00..26.39 rows=9 width=35) >>> -> Seq Scan on foo f (cost=0.00..1.03 rows=3 width=21) >>> -> Index Scan using bar_foo_id_idx on bar b (cost=0.00..8.42 rows=3 >>> width=14) >>> Index Cond: (b.foo_id = f.foo_id) >> >> I don't see a million rows here, only three. Have you run analyze >> after loading all that data? Or is it retrieving 3 rows out of a >> million? If so then an index scan does make sense. > > yeah, there are 3 matching rows, and the rest is just data to make the > table big. > > I am just still confused, because if Postgres does only use seq scan > even in very large databases, I am worried I do something very wrong in > my DB design ... Postgresql has no visibility in its indexes, meaning that whether it uses an index or not, it still has to go to the table to see if the tuple is actually visible to this transaction. For this reason, PostgreSQL switches to sequential scans quicker than other dbs that have visibility information in their indexes. The planner is pretty smart, but if you're going to hit a large % of the table anyway, it switches to sequential scans since it will have to retreive the majority of the table anyway. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Postgres 8.3 only uses seq scan
On 11/26/2008 02:04 PM, Scott Marlowe wrote: > On Tue, Nov 25, 2008 at 8:39 PM, Clemens Schwaighofer > <[EMAIL PROTECTED]> wrote: >> but on the 8.3 version i get this back >> >> # explain select * from foo f, bar b where f.foo_id = b.foo_id; >>QUERY PLAN >> -- >> Hash Join (cost=1.07..2.14 rows=3 width=24) >> Hash Cond: (b.foo_id = f.foo_id) >> -> Seq Scan on bar b (cost=0.00..1.03 rows=3 width=14) >> -> Hash (cost=1.03..1.03 rows=3 width=10) >> -> Seq Scan on foo f (cost=0.00..1.03 rows=3 width=10) > > Of course it uses a seq scan. All the data fits handily into a single > page I assume. okay, the strange thing is, that in 8.2 it always used an index scan. >> once I insert a million rows he does use the index: >> >> # explain select * from foo f, bar b where f.foo_id = b.foo_id; >>QUERY PLAN >> --- >> Nested Loop (cost=0.00..26.39 rows=9 width=35) >> -> Seq Scan on foo f (cost=0.00..1.03 rows=3 width=21) >> -> Index Scan using bar_foo_id_idx on bar b (cost=0.00..8.42 rows=3 >> width=14) >> Index Cond: (b.foo_id = f.foo_id) > > I don't see a million rows here, only three. Have you run analyze > after loading all that data? Or is it retrieving 3 rows out of a > million? If so then an index scan does make sense. yeah, there are 3 matching rows, and the rest is just data to make the table big. I am just still confused, because if Postgres does only use seq scan even in very large databases, I am worried I do something very wrong in my DB design ... -- [ Clemens Schwaighofer -=:~ ] [ IT Engineer/Manager] [ E-Graphics Communications, TEQUILA\ Japan IT Group ] [6-17-2 Ginza Chuo-ku, Tokyo 104-8167, JAPAN ] [ Tel: +81-(0)3-3545-7706Fax: +81-(0)3-3545-7343 ] [ http://www.tequila.jp ] signature.asc Description: OpenPGP digital signature
Re: [GENERAL] Postgres 8.3 only uses seq scan
On Tue, Nov 25, 2008 at 8:39 PM, Clemens Schwaighofer <[EMAIL PROTECTED]> wrote: > but on the 8.3 version i get this back > > # explain select * from foo f, bar b where f.foo_id = b.foo_id; >QUERY PLAN > -- > Hash Join (cost=1.07..2.14 rows=3 width=24) > Hash Cond: (b.foo_id = f.foo_id) > -> Seq Scan on bar b (cost=0.00..1.03 rows=3 width=14) > -> Hash (cost=1.03..1.03 rows=3 width=10) > -> Seq Scan on foo f (cost=0.00..1.03 rows=3 width=10) Of course it uses a seq scan. All the data fits handily into a single page I assume. > once I insert a million rows he does use the index: > > # explain select * from foo f, bar b where f.foo_id = b.foo_id; >QUERY PLAN > --- > Nested Loop (cost=0.00..26.39 rows=9 width=35) > -> Seq Scan on foo f (cost=0.00..1.03 rows=3 width=21) > -> Index Scan using bar_foo_id_idx on bar b (cost=0.00..8.42 rows=3 > width=14) > Index Cond: (b.foo_id = f.foo_id) I don't see a million rows here, only three. Have you run analyze after loading all that data? Or is it retrieving 3 rows out of a million? If so then an index scan does make sense. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] Postgres 8.3 only uses seq scan
Hi, I have system here with Debian/Testing and the latest 8.2 and 8.3 database installed. on a blank database I create two very simple tables Table "public.foo" Column | Type| Modifiers +---+-- foo_id | integer | not null default nextval('foo_foo_id_seq'::regclass) test | character varying | Indexes: "foo_pkey" PRIMARY KEY, btree (foo_id) Table "public.bar" Column | Type| Modifiers +---+-- bar_id | integer | not null default nextval('bar_bar_id_seq'::regclass) foo_id | integer | not null test | character varying | Indexes: "bar_pkey" PRIMARY KEY, btree (bar_id) "bar_foo_id_idx" btree (foo_id) Foreign-key constraints: "bar_foo_id_fkey" FOREIGN KEY (foo_id) REFERENCES foo(foo_id) MATCH FULL ON UPDATE CASCADE ON DELETE CASCADE now if I run a simple join query over both tables Postgres 8.2 gives this back for the explain: # explain select * from foo f, bar b where f.foo_id = b.foo_id; QUERY PLAN Nested Loop (cost=0.00..33.14 rows=3 width=76) -> Index Scan using bar_foo_id_idx on bar b (cost=0.00..12.30 rows=3 width=40) -> Index Scan using foo_pkey on foo f (cost=0.00..6.93 rows=1 width=36) Index Cond: (f.foo_id = b.foo_id) but on the 8.3 version i get this back # explain select * from foo f, bar b where f.foo_id = b.foo_id; QUERY PLAN -- Hash Join (cost=1.07..2.14 rows=3 width=24) Hash Cond: (b.foo_id = f.foo_id) -> Seq Scan on bar b (cost=0.00..1.03 rows=3 width=14) -> Hash (cost=1.03..1.03 rows=3 width=10) -> Seq Scan on foo f (cost=0.00..1.03 rows=3 width=10) once I insert a million rows he does use the index: # explain select * from foo f, bar b where f.foo_id = b.foo_id; QUERY PLAN --- Nested Loop (cost=0.00..26.39 rows=9 width=35) -> Seq Scan on foo f (cost=0.00..1.03 rows=3 width=21) -> Index Scan using bar_foo_id_idx on bar b (cost=0.00..8.42 rows=3 width=14) Index Cond: (b.foo_id = f.foo_id) I have seen this behavior on all of my postgres 8.3 installs. The indexes are there, auto vacuum is turned on. even a reindex of the tables does not help. The configuration files are identical in grounds of memory usage, query planning, etc. I see this on RPM packages for RedHat Enterprise, self compiled for FreeBSD 4, and debian packages. I am seriously very very confused. What can I do to debug this further, or find out why this happens? Does this mean Postgres 8.3 thinks a sequence scan is faster than an index scan? Even on tables with hundred thousands rows? -- [ Clemens Schwaighofer -=:~ ] [ IT Engineer/Manager] [ E-Graphics Communications, TEQUILA\ Japan IT Group ] [6-17-2 Ginza Chuo-ku, Tokyo 104-8167, JAPAN ] [ Tel: +81-(0)3-3545-7706Fax: +81-(0)3-3545-7343 ] [ http://www.tequila.jp ] signature.asc Description: PGP signature signature.asc Description: OpenPGP digital signature