On 28.05.2013, at 2:17, John Mudd <johnbm...@gmail.com> wrote:

> Thanks again.
> 
> Well, I have two problems with using the CLUSTER option. It's only temporary 
> since any updates, depending how much free space is reserved per page, 
> requires re-running the CLUSTER. And my primary concern is that it 
> arbitrarily gives an unfair advantage to the primary key SELECT. Still, it's 
> easy to test so here are the results. The primary key still looses even with 
> the CLUSTER. Granted it is close but considering this is now an unfair 
> comparison it still doesn't make sense to me. How can a search for a specific 
> row that should be fairly straight forward take longer than a search that 
> includes an ORDER BY clause?
> 

Well, you do just regular index scan because of LIMIT 1.

And now it is just a matter of index size and table organization.

I also don't understand why you consider CLUSTER unfair - the way you populated 
the table was natural cluster over my_key.

But it bothers me why my_key is always better. Can you please test it on 
different values but the same rows? Because now it is two different tuples and 
you count every io.

> 
> test=# CLUSTER test_select USING test_select_pkey ;
> CLUSTER
> test=# VACUUM ANALYZE test_select ;
> VACUUM
> 
>> (stopped postgres; reset O/S cache; started postgres)
> 
> 
> test=# explain analyze SELECT * FROM test_select WHERE key1 >= 500000 ORDER 
> BY key1, key2, key3, id LIMIT 1;
>                                                               QUERY PLAN
> --------------------------------------------------------------------------------------------------------------------------------------
>  Limit  (cost=0.00..0.08 rows=1 width=21) (actual time=19.430..19.431 rows=1 
> loops=1)
>    ->  Index Scan using my_key on test_select  (cost=0.00..41938.15 
> rows=499992 width=21) (actual time=19.428..19.428 rows=1 loops=1)
>          Index Cond: (key1 >= 500000)
>  Total runtime: 19.526 ms
> 
> 
>> (stopped postgres; reset O/S cache; started postgres)
> 
> 
> test=# explain analyze SELECT * FROM test_select WHERE id = 500000;
>                                                            QUERY PLAN         
>                                                    
> ---------------------------------------------------------------------------------------------------------------------------------
>  Index Scan using test_select_pkey on test_select  (cost=0.00..8.36 rows=1 
> width=21) (actual time=21.070..21.072 rows=1 loops=1)
>    Index Cond: (id = 500000)
>  Total runtime: 21.178 ms
> 
> 
> 
> 
> On Mon, May 27, 2013 at 10:59 AM, Evgeny Shishkin <itparan...@gmail.com> 
> wrote:
>> 
>> On May 27, 2013, at 6:35 PM, John Mudd <johnbm...@gmail.com> wrote:
>> 
>>> Thanks, that's easy enough to test. Didn't seem to help though.
>> 
>> Ok. And if you CLUSTER tables USING PK?
>> 
>>> 
>>> test=# REINDEX index test_select_pkey;
>>> REINDEX
>>> test=# VACUUM ANALYZE test_select ;
>>> VACUUM
>>> 
>>> 
>>> (stopped postgres; reset O/S cache; started postgres)
>>> 
>>> test=# explain analyze SELECT * FROM test_select WHERE key1 >= 500000 ORDER 
>>> BY key1, key2, key3, id LIMIT 1;
>>>                                                               QUERY PLAN    
>>>                                                           
>>> --------------------------------------------------------------------------------------------------------------------------------------
>>>  Limit  (cost=0.00..0.08 rows=1 width=21) (actual time=16.368..16.369 
>>> rows=1 loops=1)
>>>    ->  Index Scan using my_key on test_select  (cost=0.00..41981.16 
>>> rows=501333 width=21) (actual time=16.366..16.366 rows=1 loops=1)
>>>          Index Cond: (key1 >= 500000)
>>>  Total runtime: 16.444 ms
>>> 
>>> 
>>> (stopped postgres; reset O/S cache; started postgres)
>>> 
>>> test=# explain analyze SELECT * FROM test_select WHERE id = 500000;
>>>                                                            QUERY PLAN
>>> ---------------------------------------------------------------------------------------------------------------------------------
>>>  Index Scan using test_select_pkey on test_select  (cost=0.00..8.36 rows=1 
>>> width=21) (actual time=23.072..23.074 rows=1 loops=1)
>>>    Index Cond: (id = 500000)
>>>  Total runtime: 23.192 ms
>>> 
>>> 
>>> 
>>> 
>>> On Mon, May 27, 2013 at 10:21 AM, Evgeny Shishkin <itparan...@gmail.com> 
>>> wrote:
>>>> 
>>>> On May 27, 2013, at 6:02 PM, John Mudd <johnbm...@gmail.com> wrote:
>>>> 
>>>> > Postgres 9.1.2 on Ubuntu 12.04
>>>> >
>>>> > Any reason why a select by primary key would be slower than a select 
>>>> > that includes an ORDER BY? I was really hoping using the primary key 
>>>> > would give me a boost.
>>>> >
>>>> 
>>>> You created my_key after data loading, and PK was there all the time.
>>>> If you REINDEX PK, i bet it will be as fast.
>>>> 
>>>> > I stopped the server and cleared the O/S cache using "sync; echo 3 > 
>>>> > /proc/sys/vm/drop_caches" between the runs.
>>>> >
>>>> >
>>>> >
>>>> > test=# VACUUM ANALYZE test_select;
>>>> > VACUUM
>>>> >
>>>> > (stopped postgres; reset O/S cache; started postgres)
>>>> >
>>>> > test=# explain analyze SELECT * FROM test_select WHERE key1 >= 500000 
>>>> > ORDER BY key1, key2, key3, id LIMIT 1;
>>>> >                                                               QUERY PLAN
>>>> > --------------------------------------------------------------------------------------------------------------------------------------
>>>> >  Limit  (cost=0.00..0.08 rows=1 width=21) (actual time=12.599..12.600 
>>>> > rows=1 loops=1)
>>>> >    ->  Index Scan using my_key on test_select  (cost=0.00..41895.49 
>>>> > rows=498724 width=21) (actual time=12.597..12.597 rows=1 loops=1)
>>>> >          Index Cond: (key1 >= 500000)
>>>> >  Total runtime: 12.678 ms
>>>> >
>>>> > (stopped postgres; reset O/S cache; started postgres)
>>>> >
>>>> > test=# explain analyze SELECT * FROM test_select WHERE id = 500000;
>>>> >                                                            QUERY PLAN
>>>> > ---------------------------------------------------------------------------------------------------------------------------------
>>>> >  Index Scan using test_select_pkey on test_select  (cost=0.00..8.36 
>>>> > rows=1 width=21) (actual time=31.396..31.398 rows=1 loops=1)
>>>> >    Index Cond: (id = 500000)
>>>> >  Total runtime: 31.504 ms
>>>> >
>>>> >
>>>> >
>>>> > Schema:
>>>> >
>>>> > test=# \d test_select
>>>> >                             Table "public.test_select"
>>>> >  Column |     Type     |                        Modifiers
>>>> > --------+--------------+----------------------------------------------------------
>>>> >  id     | integer      | not null default 
>>>> > nextval('test_select_id_seq'::regclass)
>>>> >  key1   | integer      |
>>>> >  key2   | integer      |
>>>> >  key3   | integer      |
>>>> >  data   | character(4) |
>>>> > Indexes:
>>>> >     "test_select_pkey" PRIMARY KEY, btree (id)
>>>> >     "my_key" btree (key1, key2, key3, id)
>>>> >
>>>> > test=#
>>>> >
>>>> >
>>>> >
>>>> > Sample data:
>>>> >
>>>> > test=# SELECT * FROM test_select LIMIT 10;
>>>> >  id |  key1  |  key2  |  key3  | data
>>>> > ----+--------+--------+--------+------
>>>> >   1 | 984966 | 283954 | 772063 | x
>>>> >   2 | 817668 | 393533 | 924888 | x
>>>> >   3 | 751039 | 798753 | 454309 | x
>>>> >   4 | 128505 | 329643 | 280553 | x
>>>> >   5 | 105600 | 257225 | 710015 | x
>>>> >   6 | 323891 | 615614 |  83206 | x
>>>> >   7 | 194054 |  63506 | 353171 | x
>>>> >   8 | 212068 | 881225 | 271804 | x
>>>> >   9 | 644180 |  26693 | 200738 | x
>>>> >  10 | 136586 | 498699 | 554417 | x
>>>> > (10 rows)
>>>> >
>>>> >
>>>> >
>>>> >
>>>> > Here's how I populated the table:
>>>> >
>>>> > import psycopg2
>>>> >
>>>> > conn = psycopg2.connect('dbname=test')
>>>> >
>>>> > cur = conn.cursor()
>>>> >
>>>> > def random_int():
>>>> >     n = 1000000
>>>> >     return random.randint(0,n)
>>>> >
>>>> > def random_key():
>>>> >     return random_int(), random_int(), random_int()
>>>> >
>>>> > def create_table():
>>>> >     cur.execute('''
>>>> >             DROP TABLE IF EXISTS test_select;
>>>> >
>>>> >             CREATE TABLE test_select (
>>>> >                 id                      SERIAL PRIMARY KEY,
>>>> >                 key1                    INTEGER,
>>>> >                 key2                    INTEGER,
>>>> >                 key3                    INTEGER,
>>>> >                 data                    char(4)
>>>> >             );
>>>> >         ''')
>>>> >     conn.commit()
>>>> >
>>>> >     n = 1000000
>>>> >     for i in range(n):
>>>> >         cur.execute("INSERT INTO test_select(key1, key2, key3, data) 
>>>> > VALUES(%s, %s, %s, 'x')", random_key())
>>>> >     conn.commit()
>>>> >
>>>> >     cur.execute('CREATE INDEX my_key ON test_select(key1, key2, key3, 
>>>> > id)')
>>>> >     conn.commit()
>>>> >
>>>> > create_table()
>>>> >
> 

Reply via email to