John, And can you please include BUFFERS to ANALYZE?
Regards, Roman Konoval On Tue, May 28, 2013 at 9:48 AM, Evgeniy Shishkin <itparan...@gmail.com>wrote: > > > > > On 28.05.2013, at 2:17, John Mudd <johnbm...@gmail.com> wrote: > > Thanks again. > > Well, I have two problems with using the CLUSTER option. It's only > temporary since any updates, depending how much free space is reserved per > page, requires re-running the CLUSTER. And my primary concern is that it > arbitrarily gives an unfair advantage to the primary key SELECT. Still, > it's easy to test so here are the results. The primary key still looses > even with the CLUSTER. Granted it is close but considering this is now an > unfair comparison it still doesn't make sense to me. How can a search for a > specific row that should be fairly straight forward take longer than a > search that includes an ORDER BY clause? > > > Well, you do just regular index scan because of LIMIT 1. > > And now it is just a matter of index size and table organization. > > I also don't understand why you consider CLUSTER unfair - the way you > populated the table was natural cluster over my_key. > > But it bothers me why my_key is always better. Can you please test it on > different values but the same rows? Because now it is two different tuples > and you count every io. > > > test=# CLUSTER test_select USING test_select_pkey ; > CLUSTER > test=# VACUUM ANALYZE test_select ; > VACUUM > > (stopped postgres; reset O/S cache; started postgres) > > > test=# explain analyze SELECT * FROM test_select WHERE key1 >= 500000 > ORDER BY key1, key2, key3, id LIMIT 1; > QUERY PLAN > > -------------------------------------------------------------------------------------------------------------------------------------- > Limit (cost=0.00..0.08 rows=1 width=21) (actual time=19.430..19.431 > rows=1 loops=1) > -> Index Scan using my_key on test_select (cost=0.00..41938.15 > rows=499992 width=21) (actual time=19.428..19.428 rows=1 loops=1) > Index Cond: (key1 >= 500000) > Total runtime: 19.526 ms > > > (stopped postgres; reset O/S cache; started postgres) > > > test=# explain analyze SELECT * FROM test_select WHERE id = 500000; > QUERY PLAN > > > --------------------------------------------------------------------------------------------------------------------------------- > Index Scan using test_select_pkey on test_select (cost=0.00..8.36 rows=1 > width=21) (actual time=21.070..21.072 rows=1 loops=1) > Index Cond: (id = 500000) > Total runtime: 21.178 ms > > > > > On Mon, May 27, 2013 at 10:59 AM, Evgeny Shishkin <itparan...@gmail.com>wrote: > >> >> On May 27, 2013, at 6:35 PM, John Mudd <johnbm...@gmail.com> wrote: >> >> Thanks, that's easy enough to test. Didn't seem to help though. >> >> >> Ok. And if you CLUSTER tables USING PK? >> >> >> test=# REINDEX index test_select_pkey; >> REINDEX >> test=# VACUUM ANALYZE test_select ; >> VACUUM >> >> >> (stopped postgres; reset O/S cache; started postgres) >> >> test=# explain analyze SELECT * FROM test_select WHERE key1 >= 500000 >> ORDER BY key1, key2, key3, id LIMIT 1; >> QUERY PLAN >> >> >> -------------------------------------------------------------------------------------------------------------------------------------- >> Limit (cost=0.00..0.08 rows=1 width=21) (actual time=16.368..16.369 >> rows=1 loops=1) >> -> Index Scan using my_key on test_select (cost=0.00..41981.16 >> rows=501333 width=21) (actual time=16.366..16.366 rows=1 loops=1) >> Index Cond: (key1 >= 500000) >> Total runtime: 16.444 ms >> >> >> (stopped postgres; reset O/S cache; started postgres) >> >> test=# explain analyze SELECT * FROM test_select WHERE id = 500000; >> QUERY PLAN >> >> --------------------------------------------------------------------------------------------------------------------------------- >> Index Scan using test_select_pkey on test_select (cost=0.00..8.36 >> rows=1 width=21) (actual time=23.072..23.074 rows=1 loops=1) >> Index Cond: (id = 500000) >> Total runtime: 23.192 ms >> >> >> >> >> On Mon, May 27, 2013 at 10:21 AM, Evgeny Shishkin >> <itparan...@gmail.com>wrote: >> >>> >>> On May 27, 2013, at 6:02 PM, John Mudd <johnbm...@gmail.com> wrote: >>> >>> > Postgres 9.1.2 on Ubuntu 12.04 >>> > >>> > Any reason why a select by primary key would be slower than a select >>> that includes an ORDER BY? I was really hoping using the primary key would >>> give me a boost. >>> > >>> >>> You created my_key after data loading, and PK was there all the time. >>> If you REINDEX PK, i bet it will be as fast. >>> >>> > I stopped the server and cleared the O/S cache using "sync; echo 3 > >>> /proc/sys/vm/drop_caches" between the runs. >>> > >>> > >>> > >>> > test=# VACUUM ANALYZE test_select; >>> > VACUUM >>> > >>> > (stopped postgres; reset O/S cache; started postgres) >>> > >>> > test=# explain analyze SELECT * FROM test_select WHERE key1 >= 500000 >>> ORDER BY key1, key2, key3, id LIMIT 1; >>> > QUERY >>> PLAN >>> > >>> -------------------------------------------------------------------------------------------------------------------------------------- >>> > Limit (cost=0.00..0.08 rows=1 width=21) (actual time=12.599..12.600 >>> rows=1 loops=1) >>> > -> Index Scan using my_key on test_select (cost=0.00..41895.49 >>> rows=498724 width=21) (actual time=12.597..12.597 rows=1 loops=1) >>> > Index Cond: (key1 >= 500000) >>> > Total runtime: 12.678 ms >>> > >>> > (stopped postgres; reset O/S cache; started postgres) >>> > >>> > test=# explain analyze SELECT * FROM test_select WHERE id = 500000; >>> > QUERY PLAN >>> > >>> --------------------------------------------------------------------------------------------------------------------------------- >>> > Index Scan using test_select_pkey on test_select (cost=0.00..8.36 >>> rows=1 width=21) (actual time=31.396..31.398 rows=1 loops=1) >>> > Index Cond: (id = 500000) >>> > Total runtime: 31.504 ms >>> > >>> > >>> > >>> > Schema: >>> > >>> > test=# \d test_select >>> > Table "public.test_select" >>> > Column | Type | Modifiers >>> > >>> --------+--------------+---------------------------------------------------------- >>> > id | integer | not null default >>> nextval('test_select_id_seq'::regclass) >>> > key1 | integer | >>> > key2 | integer | >>> > key3 | integer | >>> > data | character(4) | >>> > Indexes: >>> > "test_select_pkey" PRIMARY KEY, btree (id) >>> > "my_key" btree (key1, key2, key3, id) >>> > >>> > test=# >>> > >>> > >>> > >>> > Sample data: >>> > >>> > test=# SELECT * FROM test_select LIMIT 10; >>> > id | key1 | key2 | key3 | data >>> > ----+--------+--------+--------+------ >>> > 1 | 984966 | 283954 | 772063 | x >>> > 2 | 817668 | 393533 | 924888 | x >>> > 3 | 751039 | 798753 | 454309 | x >>> > 4 | 128505 | 329643 | 280553 | x >>> > 5 | 105600 | 257225 | 710015 | x >>> > 6 | 323891 | 615614 | 83206 | x >>> > 7 | 194054 | 63506 | 353171 | x >>> > 8 | 212068 | 881225 | 271804 | x >>> > 9 | 644180 | 26693 | 200738 | x >>> > 10 | 136586 | 498699 | 554417 | x >>> > (10 rows) >>> > >>> > >>> > >>> > >>> > Here's how I populated the table: >>> > >>> > import psycopg2 >>> > >>> > conn = psycopg2.connect('dbname=test') >>> > >>> > cur = conn.cursor() >>> > >>> > def random_int(): >>> > n = 1000000 >>> > return random.randint(0,n) >>> > >>> > def random_key(): >>> > return random_int(), random_int(), random_int() >>> > >>> > def create_table(): >>> > cur.execute(''' >>> > DROP TABLE IF EXISTS test_select; >>> > >>> > CREATE TABLE test_select ( >>> > id SERIAL PRIMARY KEY, >>> > key1 INTEGER, >>> > key2 INTEGER, >>> > key3 INTEGER, >>> > data char(4) >>> > ); >>> > ''') >>> > conn.commit() >>> > >>> > n = 1000000 >>> > for i in range(n): >>> > cur.execute("INSERT INTO test_select(key1, key2, key3, data) >>> VALUES(%s, %s, %s, 'x')", random_key()) >>> > conn.commit() >>> > >>> > cur.execute('CREATE INDEX my_key ON test_select(key1, key2, key3, >>> id)') >>> > conn.commit() >>> > >>> > create_table() >>> > >>> >>> >> >> >