[PERFORM] Slow SELECT by primary key? Postgres 9.1.2

2013-05-27 Thread John Mudd
Postgres 9.1.2 on Ubuntu 12.04

Any reason why a select by primary key would be slower than a select that
includes an ORDER BY? I was really hoping using the primary key would give
me a boost.

I stopped the server and cleared the O/S cache using "sync; echo 3 >
/proc/sys/vm/drop_caches" between the runs.



test=# VACUUM ANALYZE test_select;
VACUUM

(stopped postgres; reset O/S cache; started postgres)

test=# explain analyze SELECT * FROM test_select WHERE key1 >= 50 ORDER
BY key1, key2, key3, id LIMIT 1;
  QUERY PLAN
--
 Limit  (cost=0.00..0.08 rows=1 width=21) (actual time=12.599..12.600
rows=1 loops=1)
   ->  Index Scan using my_key on test_select  (cost=0.00..41895.49
rows=498724 width=21) (actual time=12.597..12.597 rows=1 loops=1)
 Index Cond: (key1 >= 50)
 Total runtime: 12.678 ms

(stopped postgres; reset O/S cache; started postgres)

test=# explain analyze SELECT * FROM test_select WHERE id = 50;
   QUERY PLAN

-
 Index Scan using test_select_pkey on test_select  (cost=0.00..8.36 rows=1
width=21) (actual time=31.396..31.398 rows=1 loops=1)
   Index Cond: (id = 50)
 Total runtime: 31.504 ms



Schema:

test=# \d test_select
Table "public.test_select"
 Column | Type |Modifiers

+--+--
 id | integer  | not null default
nextval('test_select_id_seq'::regclass)
 key1   | integer  |
 key2   | integer  |
 key3   | integer  |
 data   | character(4) |
Indexes:
"test_select_pkey" PRIMARY KEY, btree (id)
"my_key" btree (key1, key2, key3, id)

test=#



Sample data:

test=# SELECT * FROM test_select LIMIT 10;
 id |  key1  |  key2  |  key3  | data
++++--
  1 | 984966 | 283954 | 772063 | x
  2 | 817668 | 393533 | 924888 | x
  3 | 751039 | 798753 | 454309 | x
  4 | 128505 | 329643 | 280553 | x
  5 | 105600 | 257225 | 710015 | x
  6 | 323891 | 615614 |  83206 | x
  7 | 194054 |  63506 | 353171 | x
  8 | 212068 | 881225 | 271804 | x
  9 | 644180 |  26693 | 200738 | x
 10 | 136586 | 498699 | 554417 | x
(10 rows)




Here's how I populated the table:

import psycopg2

conn = psycopg2.connect('dbname=test')

cur = conn.cursor()

def random_int():
n = 100
return random.randint(0,n)

def random_key():
return random_int(), random_int(), random_int()

def create_table():
cur.execute('''
DROP TABLE IF EXISTS test_select;

CREATE TABLE test_select (
id  SERIAL PRIMARY KEY,
key1INTEGER,
key2INTEGER,
key3INTEGER,
datachar(4)
);
''')
conn.commit()

n = 100
for i in range(n):
cur.execute("INSERT INTO test_select(key1, key2, key3, data)
VALUES(%s, %s, %s, 'x')", random_key())
conn.commit()

cur.execute('CREATE INDEX my_key ON test_select(key1, key2, key3, id)')
conn.commit()

create_table()


Re: [PERFORM] Slow SELECT by primary key? Postgres 9.1.2

2013-05-27 Thread John Mudd
Thanks, that's easy enough to test. Didn't seem to help though.


test=# REINDEX index test_select_pkey;
REINDEX
test=# VACUUM ANALYZE test_select ;
VACUUM


(stopped postgres; reset O/S cache; started postgres)

test=# explain analyze SELECT * FROM test_select WHERE key1 >= 50 ORDER
BY key1, key2, key3, id LIMIT 1;
  QUERY PLAN

--
 Limit  (cost=0.00..0.08 rows=1 width=21) (actual time=16.368..16.369
rows=1 loops=1)
   ->  Index Scan using my_key on test_select  (cost=0.00..41981.16
rows=501333 width=21) (actual time=16.366..16.366 rows=1 loops=1)
 Index Cond: (key1 >= 50)
 Total runtime: 16.444 ms


(stopped postgres; reset O/S cache; started postgres)

test=# explain analyze SELECT * FROM test_select WHERE id = 50;
   QUERY PLAN
-
 Index Scan using test_select_pkey on test_select  (cost=0.00..8.36 rows=1
width=21) (actual time=23.072..23.074 rows=1 loops=1)
   Index Cond: (id = 50)
 Total runtime: 23.192 ms




On Mon, May 27, 2013 at 10:21 AM, Evgeny Shishkin wrote:

>
> On May 27, 2013, at 6:02 PM, John Mudd  wrote:
>
> > Postgres 9.1.2 on Ubuntu 12.04
> >
> > Any reason why a select by primary key would be slower than a select
> that includes an ORDER BY? I was really hoping using the primary key would
> give me a boost.
> >
>
> You created my_key after data loading, and PK was there all the time.
> If you REINDEX PK, i bet it will be as fast.
>
> > I stopped the server and cleared the O/S cache using "sync; echo 3 >
> /proc/sys/vm/drop_caches" between the runs.
> >
> >
> >
> > test=# VACUUM ANALYZE test_select;
> > VACUUM
> >
> > (stopped postgres; reset O/S cache; started postgres)
> >
> > test=# explain analyze SELECT * FROM test_select WHERE key1 >= 50
> ORDER BY key1, key2, key3, id LIMIT 1;
> >   QUERY PLAN
> >
> --
> >  Limit  (cost=0.00..0.08 rows=1 width=21) (actual time=12.599..12.600
> rows=1 loops=1)
> >->  Index Scan using my_key on test_select  (cost=0.00..41895.49
> rows=498724 width=21) (actual time=12.597..12.597 rows=1 loops=1)
> >  Index Cond: (key1 >= 50)
> >  Total runtime: 12.678 ms
> >
> > (stopped postgres; reset O/S cache; started postgres)
> >
> > test=# explain analyze SELECT * FROM test_select WHERE id = 50;
> >QUERY PLAN
> >
> -
> >  Index Scan using test_select_pkey on test_select  (cost=0.00..8.36
> rows=1 width=21) (actual time=31.396..31.398 rows=1 loops=1)
> >Index Cond: (id = 50)
> >  Total runtime: 31.504 ms
> >
> >
> >
> > Schema:
> >
> > test=# \d test_select
> > Table "public.test_select"
> >  Column | Type |Modifiers
> >
> +--+--
> >  id | integer  | not null default
> nextval('test_select_id_seq'::regclass)
> >  key1   | integer  |
> >  key2   | integer  |
> >  key3   | integer  |
> >  data   | character(4) |
> > Indexes:
> > "test_select_pkey" PRIMARY KEY, btree (id)
> > "my_key" btree (key1, key2, key3, id)
> >
> > test=#
> >
> >
> >
> > Sample data:
> >
> > test=# SELECT * FROM test_select LIMIT 10;
> >  id |  key1  |  key2  |  key3  | data
> > ++++--
> >   1 | 984966 | 283954 | 772063 | x
> >   2 | 817668 | 393533 | 924888 | x
> >   3 | 751039 | 798753 | 454309 | x
> >   4 | 128505 | 329643 | 280553 | x
> >   5 | 105600 | 257225 | 710015 | x
> >   6 | 323891 | 615614 |  83206 | x
> >   7 | 194054 |  63506 | 353171 | x
> >   8 | 212068 | 881225 | 271804 | x
> >   9 | 644180 |  26693 | 200738 | x
> >  10 | 136586 | 498699 | 554417 | x
> > (10 rows)
> >
> >
> >
> >
> > Here's how I p

Re: [PERFORM] Slow SELECT by primary key? Postgres 9.1.2

2013-05-27 Thread John Mudd
Thanks again.

Well, I have two problems with using the CLUSTER option. It's only
temporary since any updates, depending how much free space is reserved per
page, requires re-running the CLUSTER. And my primary concern is that it
arbitrarily gives an unfair advantage to the primary key SELECT. Still,
it's easy to test so here are the results. The primary key still looses
even with the CLUSTER. Granted it is close but considering this is now an
unfair comparison it still doesn't make sense to me. How can a search for a
specific row that should be fairly straight forward take longer than a
search that includes an ORDER BY clause?


test=# CLUSTER test_select USING test_select_pkey ;
CLUSTER
test=# VACUUM ANALYZE test_select ;
VACUUM

(stopped postgres; reset O/S cache; started postgres)


test=# explain analyze SELECT * FROM test_select WHERE key1 >= 50 ORDER
BY key1, key2, key3, id LIMIT 1;
  QUERY PLAN
--
 Limit  (cost=0.00..0.08 rows=1 width=21) (actual time=19.430..19.431
rows=1 loops=1)
   ->  Index Scan using my_key on test_select  (cost=0.00..41938.15
rows=42 width=21) (actual time=19.428..19.428 rows=1 loops=1)
 Index Cond: (key1 >= 50)
 Total runtime: 19.526 ms


(stopped postgres; reset O/S cache; started postgres)


test=# explain analyze SELECT * FROM test_select WHERE id = 50;
   QUERY PLAN

-
 Index Scan using test_select_pkey on test_select  (cost=0.00..8.36 rows=1
width=21) (actual time=21.070..21.072 rows=1 loops=1)
   Index Cond: (id = 50)
 Total runtime: 21.178 ms




On Mon, May 27, 2013 at 10:59 AM, Evgeny Shishkin wrote:

>
> On May 27, 2013, at 6:35 PM, John Mudd  wrote:
>
> Thanks, that's easy enough to test. Didn't seem to help though.
>
>
> Ok. And if you CLUSTER tables USING PK?
>
>
> test=# REINDEX index test_select_pkey;
> REINDEX
> test=# VACUUM ANALYZE test_select ;
> VACUUM
>
>
> (stopped postgres; reset O/S cache; started postgres)
>
> test=# explain analyze SELECT * FROM test_select WHERE key1 >= 50
> ORDER BY key1, key2, key3, id LIMIT 1;
>   QUERY PLAN
>
>
> --
>  Limit  (cost=0.00..0.08 rows=1 width=21) (actual time=16.368..16.369
> rows=1 loops=1)
>->  Index Scan using my_key on test_select  (cost=0.00..41981.16
> rows=501333 width=21) (actual time=16.366..16.366 rows=1 loops=1)
>  Index Cond: (key1 >= 50)
>  Total runtime: 16.444 ms
>
>
> (stopped postgres; reset O/S cache; started postgres)
>
> test=# explain analyze SELECT * FROM test_select WHERE id = 50;
>QUERY PLAN
>
> -
>   Index Scan using test_select_pkey on test_select  (cost=0.00..8.36
> rows=1 width=21) (actual time=23.072..23.074 rows=1 loops=1)
>    Index Cond: (id = 50)
>  Total runtime: 23.192 ms
>
>
>
>
> On Mon, May 27, 2013 at 10:21 AM, Evgeny Shishkin wrote:
>
>>
>> On May 27, 2013, at 6:02 PM, John Mudd  wrote:
>>
>> > Postgres 9.1.2 on Ubuntu 12.04
>> >
>> > Any reason why a select by primary key would be slower than a select
>> that includes an ORDER BY? I was really hoping using the primary key would
>> give me a boost.
>> >
>>
>> You created my_key after data loading, and PK was there all the time.
>> If you REINDEX PK, i bet it will be as fast.
>>
>> > I stopped the server and cleared the O/S cache using "sync; echo 3 >
>> /proc/sys/vm/drop_caches" between the runs.
>> >
>> >
>> >
>> > test=# VACUUM ANALYZE test_select;
>> > VACUUM
>> >
>> > (stopped postgres; reset O/S cache; started postgres)
>> >
>> > test=# explain analyze SELECT * FROM test_select WHERE key1 >= 50
>> ORDER BY key1, key2, key3, id LIMIT 1;
>> >   QUERY PLAN
>> >
>> --
>> >  Limit  (cost=0.00..

Re: [PERFORM] Slow SELECT by primary key? Postgres 9.1.2

2013-05-30 Thread John Mudd
I flushed the caches in an attempt to get meaningful results. I've seen
complaints to previous posts that don't include clearing the caches.

I agree this tends to be artificial in another direction. I will strive to
come up with a more realistic test environment next time. Maybe performing
many random reads initially to fill the caches with random blocks. That
might allow for minimal assistance from the cache and be more realistic.



On Thu, May 30, 2013 at 11:13 AM, Merlin Moncure  wrote:

> On Mon, May 27, 2013 at 9:02 AM, John Mudd  wrote:
> > Postgres 9.1.2 on Ubuntu 12.04
> >
> > Any reason why a select by primary key would be slower than a select that
> > includes an ORDER BY? I was really hoping using the primary key would
> give
> > me a boost.
> >
> > I stopped the server and cleared the O/S cache using "sync; echo 3 >
> > /proc/sys/vm/drop_caches" between the runs.
> >
> >
> >
> > test=# VACUUM ANALYZE test_select;
> > VACUUM
> >
> > (stopped postgres; reset O/S cache; started postgres)
> >
> > test=# explain analyze SELECT * FROM test_select WHERE key1 >= 50
> ORDER
> > BY key1, key2, key3, id LIMIT 1;
> >   QUERY PLAN
> >
> --
> >  Limit  (cost=0.00..0.08 rows=1 width=21) (actual time=12.599..12.600
> rows=1
> > loops=1)
> >->  Index Scan using my_key on test_select  (cost=0.00..41895.49
> > rows=498724 width=21) (actual time=12.597..12.597 rows=1 loops=1)
> >  Index Cond: (key1 >= 50)
> >  Total runtime: 12.678 ms
> >
> > (stopped postgres; reset O/S cache; started postgres)
>
>
> why are you flushing postgres/os cache?  when you do that, you are
> measuring raw read time from disks.  Typical disk seek time is
> measured in milliseconds so the timings are completely appropriate
> once you remove caching effects. Hard drives (at least, the spinning
> kind) are slow and one of the major challenges of database and
> hardware engineering is working around their limitations.  Fortunately
> it looks like faster storage will soon be commonplace for reasonable
> prices.
>
> merlin
>