[GENERAL] Indexing problem with OFFSET LIMIT

2008-08-29 Thread Oliver Weichhold
Hello

I have problem in my applications and don't know how to fix it.

This is the table and one of the indexes:

CREATE TABLE foo
(
  id serial NOT NULL,
  foo_name character varying(100),
  realm_id integer

  ... and about 50 other columns
)

CREATE INDEX idx_foo_name_realm
  ON foo
  USING btree
  (realm_id, foo_name);

Table foo contains about 8 Million Rows.


The problem:

Consider this query:

SELECT * FROM foo WHERE realm_id = 228 order by foo_name LIMIT 200 OFFSET
15000

And it's execution plan:

Limit  (cost=57527.13..58294.16 rows=200 width=575) (actual
time=182.302..184.971 rows=200 loops=1)
  -  Index Scan using idx_foo_name_realm on foo  (cost=0.00..62159.98
rows=16208 width=575) (actual time=0.085..166.861 rows=15200 loops=1)
Index Cond: (realm_id = 228)
Total runtime: 185.591 ms


And now look at this:

SELECT * FROM foo WHERE realm_id = 228 order by foo_name LIMIT 200 OFFSET
15999

Limit  (cost=59601.92..59602.42 rows=200 width=575) (actual
time=1069.759..1072.310 rows=200 loops=1)
  -  Sort  (cost=59561.92..59602.44 rows=16208 width=575) (actual
time=929.948..1052.620 rows=16199 loops=1)
Sort Key: foo_name
Sort Method:  external merge  Disk: 8984kB
-  Bitmap Heap Scan on foo  (cost=306.69..54270.62 rows=16208
width=575) (actual time=9.612..235.902 rows=21788 loops=1)
  Recheck Cond: (realm_id = 228)
  -  Bitmap Index Scan on foo_realm_id  (cost=0.00..302.64
rows=16208 width=0) (actual time=8.733..8.733 rows=21810 loops=1)
Index Cond: (realm_id = 228)
Total runtime: 1084.706 ms

Execution time increases tenfold because postgres stopped using the index.

Can anybody explain to me what's going on and what can be done? Is this a
memory problem?


Re: [GENERAL] Indexing problem with OFFSET LIMIT

2008-08-29 Thread David Rowley
I'm no expert at reading query plans, but I'm guessing the planner chose the
other plan because your offset + limit went beyond the row estimate.

 

Look's like it's then doing a disk based sort in the other plan which
probably explain why it's slow.

 

Someone please correct me if I'm wrong.

 

Perhaps if you can spare a little more work_mem in postgesql.conf it might
go back to a memory sort. 

 

David.

 

 

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Oliver Weichhold
Sent: 29 August 2008 21:38
To: pgsql-general@postgresql.org
Subject: [GENERAL] Indexing problem with OFFSET LIMIT

 

Hello

I have problem in my applications and don't know how to fix it. 

This is the table and one of the indexes:

CREATE TABLE foo
(
  id serial NOT NULL,
  foo_name character varying(100),
  realm_id integer

  ... and about 50 other columns
)

CREATE INDEX idx_foo_name_realm
  ON foo
  USING btree
  (realm_id, foo_name);

Table foo contains about 8 Million Rows.  


The problem:

Consider this query:

SELECT * FROM foo WHERE realm_id = 228 order by foo_name LIMIT 200 OFFSET
15000

And it's execution plan:

Limit  (cost=57527.13..58294.16 rows=200 width=575) (actual
time=182.302..184.971 rows=200 loops=1)
  -  Index Scan using idx_foo_name_realm on foo  (cost=0.00..62159.98
rows=16208 width=575) (actual time=0.085..166.861 rows=15200 loops=1)
Index Cond: (realm_id = 228)
Total runtime: 185.591 ms


And now look at this:

SELECT * FROM foo WHERE realm_id = 228 order by foo_name LIMIT 200 OFFSET
15999

Limit  (cost=59601.92..59602.42 rows=200 width=575) (actual
time=1069.759..1072.310 rows=200 loops=1)
  -  Sort  (cost=59561.92..59602.44 rows=16208 width=575) (actual
time=929.948..1052.620 rows=16199 loops=1)
Sort Key: foo_name
Sort Method:  external merge  Disk: 8984kB
-  Bitmap Heap Scan on foo  (cost=306.69..54270.62 rows=16208
width=575) (actual time=9.612..235.902 rows=21788 loops=1)
  Recheck Cond: (realm_id = 228)
  -  Bitmap Index Scan on foo_realm_id  (cost=0.00..302.64
rows=16208 width=0) (actual time=8.733..8.733 rows=21810 loops=1)
Index Cond: (realm_id = 228)
Total runtime: 1084.706 ms

Execution time increases tenfold because postgres stopped using the index.

Can anybody explain to me what's going on and what can be done? Is this a
memory problem?



Re: [GENERAL] Indexing problem with OFFSET LIMIT

2008-08-29 Thread Merlin Moncure
On Fri, Aug 29, 2008 at 4:38 PM, Oliver Weichhold [EMAIL PROTECTED] wrote:
 Hello

 I have problem in my applications and don't know how to fix it.

 This is the table and one of the indexes:

 CREATE TABLE foo
 (
   id serial NOT NULL,
   foo_name character varying(100),
   realm_id integer

   ... and about 50 other columns
 )

 CREATE INDEX idx_foo_name_realm
   ON foo
   USING btree
   (realm_id, foo_name);

 Table foo contains about 8 Million Rows.


 The problem:

 Consider this query:

 SELECT * FROM foo WHERE realm_id = 228 order by foo_name LIMIT 200 OFFSET
 15000

try this:
SELECT * FROM foo WHERE realm_id = 228 order by realm_id, foo_name
LIMIT 200 OFFSET
 15000

Or even better don't use 'offset' at all.   It's simply lousy.   If
you want to skip ahead 200 rows at a time, save off the previous last
extracted rows in the app:
1st time:
select * from foo order by realm_id, foo_name limit 200;
times after that:
select * from foo where (realm_id, foo_name)  (last_realm_id,
last_foo_name) order by realm_id, foo_name limit 200;

you should be pleasantly surprised :-).  This is also a little bit
more graceful if other sessions are deleting/inserting rows while you
are browsing.

merlin

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Indexing problem with OFFSET LIMIT

2008-08-29 Thread Tom Lane
Merlin Moncure [EMAIL PROTECTED] writes:
 On Fri, Aug 29, 2008 at 4:38 PM, Oliver Weichhold [EMAIL PROTECTED] wrote:
 Consider this query:
 
 SELECT * FROM foo WHERE realm_id = 228 order by foo_name LIMIT 200 OFFSET
 15000

 try this:
 SELECT * FROM foo WHERE realm_id = 228 order by realm_id, foo_name
 LIMIT 200 OFFSET
  15000

 Or even better don't use 'offset' at all.   It's simply lousy.   If
 you want to skip ahead 200 rows at a time, save off the previous last
 extracted rows in the app:

Yeah, large OFFSET values are always going to suck performance-wise,
because there is no magic way to skip over those rows --- the backend
has to read them, and then throw them away internally.  Avoid OFFSET.

Aside from the type of trick Merlin mentions, have you considered using
a cursor?

regards, tom lane

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general