Re: [PERFORM] strategies for optimizing read on rather large tables

PFC Sat, 04 Jun 2005 04:21:08 -0700

select advert_id from acr_cache where category_id = ? and region_id = ?
order by XXX {asc|desc} limit 20;

where XXX is one of 5 possible fields,
timestamp,
timestamp,
text,
text,
numeric


        Create 5 indexes on ( category_id, region_id, a field )
        where "a field" is one of your 5 fields.

Then write your query as :

select advert_id from acr_cache where category_id = ? and region_id = ?
order by category_id, region_id, XXX limit 20;

select advert_id from acr_cache where category_id = ? and region_id = ?
order by category_id desc, region_id desc, XXX desc limit 20;

This should put your query down to a millisecond. It will use the indexfor the lookup, the sort and the limit, and hence only retrieve 20 rowsfor the table. Downside is you have 5 indexes, but that's not so bad.

If your categories and regions form a tree, you should definitely use altree datatype, which enables indexed operators like "is contained in"which would probably allow you to reduce the size of your cache table alot.


we created index on acr_cache (category_id, region_id)
and it works rather well.
usually.
if a given "crossing" (category + region) has small amount of ads (less
then 10000) - the query is good enough (up to 300 miliseconds).
but when we enter the crossings which result in 50000 ads - the query
takes up to 10 seconds.
which is almost "forever".

we thought about creating indices like this:
index on acr_cache (effective_date);
where effective_dateis on of the timestamp fields.
it worked well for the crossings with lots of ads, but when we asked for
small crossing (like 1000 ads) it took > 120 seconds!
it appears that postgresql was favorizing this new advert instead of
using much better index on category_id and region_id.

actually - i'm not sure what to do next.
i am even thinkinh about createing special indices (partial) for big
crossings, but that's just weird. plus the fact that already the
acr_cache vacuum time exceeds 3 hours!.


any suggestions?
hardware is dual xeon 3 ghz, 4G ram, hardware scsi raid put into raid 1.
settings in postgresql.conf:
listen_addresses = '*'
port = 5800
max_connections = 300
superuser_reserved_connections = 50
shared_buffers = 131072
work_mem = 4096
maintenance_work_mem = 65536
fsync = false
commit_delay = 100
commit_siblings = 5
checkpoint_segments = 10
effective_cache_size = 10000
random_page_cost = 1.1
log_destination = 'stderr'
redirect_stderr = true
log_directory = '/home/pgdba/logs'
log_filename = 'postgresql-%Y-%m-%d_%H%M%S.log'
log_truncate_on_rotation = false
log_rotation_age = 1440
log_rotation_size = 502400
log_min_duration_statement = -1
log_connections = true
log_duration = true
log_line_prefix = '[%t] [%p] <[EMAIL PROTECTED]> '
log_statement = 'all'
stats_start_collector = true
stats_command_string = true
stats_block_level = true
stats_row_level = true
stats_reset_on_server_start = true
lc_messages = 'en_US.UTF-8'
lc_monetary = 'en_US.UTF-8'
lc_numeric = 'en_US.UTF-8'
lc_time = 'en_US.UTF-8'

actual max numer of connection is 120 plus some administrativeconnections (psql sessions).

postgresql version 8.0.2 on linux debian sarge.

best regards,

depesz




---------------------------(end of broadcast)---------------------------
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]

Re: [PERFORM] strategies for optimizing read on rather large tables

Reply via email to