Re: [PERFORM] select on 1milion register = 6s
On 7/28/07, Bruno Rodrigues Siqueira <[EMAIL PROTECTED]> wrote: > > Ok. > QUERY PLAN > Sort (cost=11449.37..11449.40 rows=119 width=8) (actual > time=14431.537..14431.538 rows=2 loops=1) > Sort Key: to_char(data_encerramento, '-mm'::text) > -> HashAggregate (cost=11448.79..11448.96 rows=119 width=8) (actual > time=14431.521..14431.523 rows=2 loops=1) > -> Index Scan using detalhamento_bas_idx3003 on detalhamento_bas > (cost=0.00..11442.95 rows=11679 width=8) (actual time=0.135..12719.155 > rows=2335819 loops=1) See the row mismatch there? It expects about 11k rows, gets back 2.3 million. That's a pretty big misestimate. Have you run analyze recently on this table? Is there a reason you're doing this: to_char( data_encerramento ,'-mm') between '2006-12' and '2007-01' when you should be able to just do: data_encerramento between '2006-12-01' and '2007-01-31' ? that should be able to use good estimates from analyze. My guess is the planner is making a bad guess because of the way you're handling the dates. > SERVER > DELL PowerEdge 2950 > XEON Quad-Core 3.0Ghz > 4Gb RAM > Linux CentOS 5.0 64-bits > Postgres 8.1.4 > Postgresql.conf > # - Memory - > > shared_buffers = 5 # min 16 or max_connections*2, 8KB 400 Meg is kind of low for a server with 4 G ram. 25% is more reasonable (i.e. 125000 buffers) > work_mem = 3145728 # min 64, size in KB > maintenance_work_mem = 4194304 # min 1024, size in KB Whoa nellie! thats ~ 3 Gig of work mem, and 4 gig of maintenance work mem. In a machine with 4 gig ram, that's a recipe for disaster. Something more reasonable would be 128000 (~125Meg) for each since you've limited your machine to 10 connections you should be ok. setting work_mem too high can run your machine out of memory and into a swap storm that will kill performance. > fsync = off # turns forced synchronization on or > off So, the data in this database isn't important? Cause that's what fsync = off says to me. Better to buy yourself a nice battery backed caching RAID controller than turn off fsync. > effective_cache_size = 41943040 # typically 8KB each And you're machine has 343,604,830,208 bytes of memory available for caching? Seems a little high to me. > random_page_cost = 1# units are one sequential page > fetch Seldom if ever is it a good idea to bonk the planner on the head with random_page_cost=1. setting it to 1.2 ot 1.4 is low enough, but 1.4 to 2.0 is more realistic. > stats_start_collector = off > #stats_command_string = off > #stats_block_level = off > #stats_row_level = off > #stats_reset_on_server_start = off I think you need stats_row_level on for autovacuum, but I'm not 100% sure. Let us know what happens after fixing these settings and running analyze and running explain analyze, with possible changes to the query. ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
RES: [PERFORM] select on 1milion register = 6s
Yes, i do. -Mensagem original- De: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Em nome de Hervé Piedvache Enviada em: sábado, 28 de julho de 2007 16:57 Para: pgsql-performance@postgresql.org Cc: Bruno Rodrigues Siqueira Assunto: Re: [PERFORM] select on 1milion register = 6s Do you have analyzed your table before doing this ? Le samedi 28 juillet 2007, Bruno Rodrigues Siqueira a écrit : > Friends, > > > > > > > > Who can help me? My SELECT in a base with 1 milion register, > using expression index = 6seconds. > > > > > > Please, I don't know how to makes it better. > > > > > > Thanks -- Hervé Piedvache ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
RES: [PERFORM] select on 1milion register = 6s
Yes. Look this... and please, tell me if you can help me... Thanks Query EXPLAIN ANALYZE select to_char(data_encerramento,'mm/') as opcoes_mes, to_char(data_encerramento,'-mm') as ordem from detalhamento_bas where to_char( data_encerramento ,'-mm') between '2006-12' and '2007-01' GROUP BY opcoes_mes, ordem ORDER BY ordem DESC QUERY RESULT QUERY PLAN Sort (cost=11449.37..11449.40 rows=119 width=8) (actual time=14431.537..14431.538 rows=2 loops=1) Sort Key: to_char(data_encerramento, '-mm'::text) -> HashAggregate (cost=11448.79..11448.96 rows=119 width=8) (actual time=14431.521..14431.523 rows=2 loops=1) -> Index Scan using detalhamento_bas_idx3003 on detalhamento_bas (cost=0.00..11442.95 rows=11679 width=8) (actual time=0.135..12719.155 rows=2335819 loops=1) Index Cond: ((to_char(data_encerramento, '-mm'::text) >= '2006-12'::text) AND (to_char(data_encerramento, '-mm'::text) <= '2007-01'::text)) Total runtime: 14431.605 ms SERVER DELL PowerEdge 2950 XEON Quad-Core 3.0Ghz 4Gb RAM Linux CentOS 5.0 64-bits Postgres 8.1.4 Postgresql.conf # - # PostgreSQL configuration file # - # # This file consists of lines of the form: # # name = value # # (The '=' is optional.) White space may be used. Comments are introduced # with '#' anywhere on a line. The complete list of option names and # allowed values can be found in the PostgreSQL documentation. The # commented-out settings shown in this file represent the default values. # # Please note that re-commenting a setting is NOT sufficient to revert it # to the default value, unless you restart the postmaster. # # Any option can also be given as a command line switch to the # postmaster, e.g. 'postmaster -c log_connections=on'. Some options # can be changed at run-time with the 'SET' SQL command. # # This file is read on postmaster startup and when the postmaster # receives a SIGHUP. If you edit the file on a running system, you have # to SIGHUP the postmaster for the changes to take effect, or use # "pg_ctl reload". Some settings, such as listen_addresses, require # a postmaster shutdown and restart to take effect. #--- # FILE LOCATIONS #--- # The default values of these variables are driven from the -D command line # switch or PGDATA environment variable, represented here as ConfigDir. #data_directory = 'ConfigDir' # use data in another directory #hba_file = 'ConfigDir/pg_hba.conf' # host-based authentication file #ident_file = 'ConfigDir/pg_ident.conf' # IDENT configuration file # If external_pid_file is not explicitly set, no extra pid file is written. #external_pid_file = '(none)' # write an extra pid file #--- # CONNECTIONS AND AUTHENTICATION #--- # - Connection Settings - listen_addresses = '*' # what IP address(es) to listen on; # comma-separated list of addresses; # defaults to 'localhost', '*' = all #port = 5432 max_connections = 10 # note: increasing max_connections costs ~400 bytes of shared memory per # connection slot, plus lock space (see max_locks_per_transaction). You # might also need to raise shared_buffers to support more connections. #superuser_reserved_connections = 2 #unix_socket_directory = '' #unix_socket_group = '' #unix_socket_permissions = 0777 # octal #bonjour_name = '' # defaults to the computer name # - Security & Authentication - #authentication_timeout = 60# 1-600, in seconds #ssl = off #password_encryption = on #db_user_namespace = off # Kerberos #krb_server_keyfile = '' #krb_srvname = 'postgres' #krb_server_hostname = '' # empty string matches any keytab entry #krb_caseins_users = off # - TCP Keepalives - # see 'man 7 tcp' for details #tcp_keepalives_idle = 0# TCP_KEEPIDLE, in seconds; # 0 selects the system default #tcp_keepalives_interval = 0# TCP_KEEPINTVL, in seconds; # 0 selects the system default #tcp_keepalives_count = 0 # TCP_KEEPCNT; # 0 selects the system default #--- # RESOURCE USAGE (except WAL) #--- # - Memory - shared_buffers = 5 # min 16 or max_connections*2, 8KB each t
RES: RES: [PERFORM] select on 1milion register = 6s
Data_encerramento is a timestamp column I will try your tip. Thanks -Mensagem original- De: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Em nome de Ragnar Enviada em: sábado, 28 de julho de 2007 19:36 Para: Bruno Rodrigues Siqueira Cc: pgsql-performance@postgresql.org Assunto: Re: RES: [PERFORM] select on 1milion register = 6s On lau, 2007-07-28 at 17:12 -0300, Bruno Rodrigues Siqueira wrote: > where > > to_char( data_encerramento ,'-mm') > between '2006-12' and '2007-01' assuming data_encerramento is a date column, try: WHERE data_encerramento between '2006-12-01' and '2007-01-31' gnari ---(end of broadcast)--- TIP 6: explain analyze is your friend ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: RES: [PERFORM] select on 1milion register = 6s
On lau, 2007-07-28 at 17:12 -0300, Bruno Rodrigues Siqueira wrote: > where > > to_char( data_encerramento ,'-mm') > between '2006-12' and '2007-01' assuming data_encerramento is a date column, try: WHERE data_encerramento between '2006-12-01' and '2007-01-31' gnari ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] Slow query with backwards index scan
* Craig James <[EMAIL PROTECTED]> [20070728 22:00]: > >>SELECT * FROM ( > >> (SELECT * FROM large_table lt > >> WHERE lt.user_id = 12345 > >> ORDER BY created_at DESC LIMIT 10) AS q1 > >> UNION > >> (SELECT * FROM large_table lt > >> WHERE user_id IN (SELECT contact_id FROM relationships WHERE > >> user_id=12345) > >> ORDER BY created_at DESC LIMIT 10) AS q2 > >>ORDER BY created_at DESC LIMIT 10; > > > >It's not possible to use ORDER BY or LIMIT within unioned queries. > > > >http://www.postgresql.org/docs/8.2/static/sql-select.html#SQL-UNION > > If I'm reading this documentation correctly, it *is* possible, as long as > they're inside of a sub-select, as in this case. I completely overlooked that obvious note in the documentation, sorry. I tried it only with the aliases which fooled me into thinking that doesn't work at all: testdb=# (select 1 limit 1) as q1 union (select 2) as q2; ERROR: syntax error at or near "as" LINE 1: (select 1 limit 1) as q1 union (select 2) as q2; ^ but this works: testdb=# (select 1 limit 1) union (select 2); ?column? -- 1 2 Great - that works! What I didn't realize in the original post is that the problem actually seems to be how to retrieve the rows from large_table for the correlated relationships in an efficient way - the second of the two queries that could be UNIONed. Using a subselect is efficient for the user with few relationships and matched rows at the end of the sorted large_table: testdb=# EXPLAIN ANALYZE SELECT * FROM large_table lt WHERE user_id IN (SELECT contact_id FROM relationships WHERE user_id=12345) ORDER BY created_at DESC LIMIT 10; QUERY PLAN --- Limit (cost=6963.94..6963.96 rows=10 width=621) (actual time=94.598..94.629 rows=4 loops=1) -> Sort (cost=6963.94..6966.96 rows=1211 width=621) (actual time=94.592..94.602 rows=4 loops=1) Sort Key: lt.created_at -> Nested Loop (cost=39.52..6901.92 rows=1211 width=621) (actual time=85.670..94.547 rows=4 loops=1) -> HashAggregate (cost=39.52..39.53 rows=1 width=4) (actual time=23.549..23.552 rows=1 loops=1) -> Bitmap Heap Scan on relationships (cost=4.33..39.49 rows=10 width=4) (actual time=23.526..23.530 rows=1 loops=1) Recheck Cond: (user_id = 12345) -> Bitmap Index Scan on relationships_user_id_contact_id_index (cost=0.00..4.33 rows=10 width=0) (actual time=0.027..0.027 rows=1 loops=1) Index Cond: (user_id = 12345) -> Index Scan using large_user_id_started_at_index on large_table lt (cost=0.00..6834.04 rows=2268 width=621) (actual time=62.108..70.952 rows=4 loops=1) Index Cond: (lt.user_id = relationships.contact_id) Total runtime: 94.875 ms But the subselect is not fast for the user with many relationships and matched rows at the beginning of the sorted large_table: testdb=# EXPLAIN ANALYZE SELECT * FROM large_table lt WHERE user_id IN (SELECT contact_id FROM relationships WHERE user_id=5) ORDER BY created_at DESC LIMIT 10; QUERY PLAN - Limit (cost=6963.94..6963.96 rows=10 width=621) (actual time=53187.349..53187.424 rows=10 loops=1) -> Sort (cost=6963.94..6966.96 rows=1211 width=621) (actual time=53187.341..53187.360 rows=10 loops=1) Sort Key: lt.created_at -> Nested Loop (cost=39.52..6901.92 rows=1211 width=621) (actual time=201.728..52673.800 rows=69018 loops=1) -> HashAggregate (cost=39.52..39.53 rows=1 width=4) (actual time=178.777..178.966 rows=40 loops=1) -> Bitmap Heap Scan on relationships (cost=4.33..39.49 rows=10 width=4) (actual time=47.049..178.560 rows=40 loops=1) Recheck Cond: (user_id = 5) -> Bitmap Index Scan on relationships_user_id_contact_id_index (cost=0.00..4.33 rows=10 width=0) (actual time=28.721..28.721 rows=40 loops=1) Index Cond: (user_id = 5) -> Index Scan using large_user_id_started_
Re: [PERFORM] Slow query with backwards index scan
As other posters have pointed out, you can overcome the ORDER BY/LIMIT restriction on UNIONs with parentheses. I think I misbalanced the parentheses in my original post, which would have caused an error if you just copied and pasted. I don't think the limitation has to do with planning--just parsing the query. ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
RES: [PERFORM] select on 1milion register = 6s
Ok. Query EXPLAIN ANALYZE select to_char(data_encerramento,'mm/') as opcoes_mes, to_char(data_encerramento,'-mm') as ordem from detalhamento_bas where to_char( data_encerramento ,'-mm') between '2006-12' and '2007-01' GROUP BY opcoes_mes, ordem ORDER BY ordem DESC QUERY RESULT QUERY PLAN Sort (cost=11449.37..11449.40 rows=119 width=8) (actual time=14431.537..14431.538 rows=2 loops=1) Sort Key: to_char(data_encerramento, '-mm'::text) -> HashAggregate (cost=11448.79..11448.96 rows=119 width=8) (actual time=14431.521..14431.523 rows=2 loops=1) -> Index Scan using detalhamento_bas_idx3003 on detalhamento_bas (cost=0.00..11442.95 rows=11679 width=8) (actual time=0.135..12719.155 rows=2335819 loops=1) Index Cond: ((to_char(data_encerramento, '-mm'::text) >= '2006-12'::text) AND (to_char(data_encerramento, '-mm'::text) <= '2007-01'::text)) Total runtime: 14431.605 ms SERVER DELL PowerEdge 2950 XEON Quad-Core 3.0Ghz 4Gb RAM Linux CentOS 5.0 64-bits Postgres 8.1.4 Postgresql.conf # - # PostgreSQL configuration file # - # # This file consists of lines of the form: # # name = value # # (The '=' is optional.) White space may be used. Comments are introduced # with '#' anywhere on a line. The complete list of option names and # allowed values can be found in the PostgreSQL documentation. The # commented-out settings shown in this file represent the default values. # # Please note that re-commenting a setting is NOT sufficient to revert it # to the default value, unless you restart the postmaster. # # Any option can also be given as a command line switch to the # postmaster, e.g. 'postmaster -c log_connections=on'. Some options # can be changed at run-time with the 'SET' SQL command. # # This file is read on postmaster startup and when the postmaster # receives a SIGHUP. If you edit the file on a running system, you have # to SIGHUP the postmaster for the changes to take effect, or use # "pg_ctl reload". Some settings, such as listen_addresses, require # a postmaster shutdown and restart to take effect. #--- # FILE LOCATIONS #--- # The default values of these variables are driven from the -D command line # switch or PGDATA environment variable, represented here as ConfigDir. #data_directory = 'ConfigDir' # use data in another directory #hba_file = 'ConfigDir/pg_hba.conf' # host-based authentication file #ident_file = 'ConfigDir/pg_ident.conf' # IDENT configuration file # If external_pid_file is not explicitly set, no extra pid file is written. #external_pid_file = '(none)' # write an extra pid file #--- # CONNECTIONS AND AUTHENTICATION #--- # - Connection Settings - listen_addresses = '*' # what IP address(es) to listen on; # comma-separated list of addresses; # defaults to 'localhost', '*' = all #port = 5432 max_connections = 10 # note: increasing max_connections costs ~400 bytes of shared memory per # connection slot, plus lock space (see max_locks_per_transaction). You # might also need to raise shared_buffers to support more connections. #superuser_reserved_connections = 2 #unix_socket_directory = '' #unix_socket_group = '' #unix_socket_permissions = 0777 # octal #bonjour_name = '' # defaults to the computer name # - Security & Authentication - #authentication_timeout = 60# 1-600, in seconds #ssl = off #password_encryption = on #db_user_namespace = off # Kerberos #krb_server_keyfile = '' #krb_srvname = 'postgres' #krb_server_hostname = '' # empty string matches any keytab entry #krb_caseins_users = off # - TCP Keepalives - # see 'man 7 tcp' for details #tcp_keepalives_idle = 0# TCP_KEEPIDLE, in seconds; # 0 selects the system default #tcp_keepalives_interval = 0# TCP_KEEPINTVL, in seconds; # 0 selects the system default #tcp_keepalives_count = 0 # TCP_KEEPCNT; # 0 selects the system default #--- # RESOURCE USAGE (except WAL) #--- # - Memory - shared_buffers = 5 # min 16 or max_connections*2, 8KB each temp_buffers = 1000 # min 1
Re: [PERFORM] Slow query with backwards index scan
Tilmann Singer wrote: * [EMAIL PROTECTED] <[EMAIL PROTECTED]> [20070728 21:05]: Let's try putting the sort/limit in each piece of the UNION to speed them up separately. SELECT * FROM ( (SELECT * FROM large_table lt WHERE lt.user_id = 12345 ORDER BY created_at DESC LIMIT 10) AS q1 UNION (SELECT * FROM large_table lt WHERE user_id IN (SELECT contact_id FROM relationships WHERE user_id=12345) ORDER BY created_at DESC LIMIT 10) AS q2 ORDER BY created_at DESC LIMIT 10; It's not possible to use ORDER BY or LIMIT within unioned queries. http://www.postgresql.org/docs/8.2/static/sql-select.html#SQL-UNION "ORDER BY and LIMIT can be attached to a subexpression if it is enclosed in parentheses" ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Slow query with backwards index scan
Tilmann Singer wrote: * [EMAIL PROTECTED] <[EMAIL PROTECTED]> [20070728 21:05]: Let's try putting the sort/limit in each piece of the UNION to speed them up separately. SELECT * FROM ( (SELECT * FROM large_table lt WHERE lt.user_id = 12345 ORDER BY created_at DESC LIMIT 10) AS q1 UNION (SELECT * FROM large_table lt WHERE user_id IN (SELECT contact_id FROM relationships WHERE user_id=12345) ORDER BY created_at DESC LIMIT 10) AS q2 ORDER BY created_at DESC LIMIT 10; It's not possible to use ORDER BY or LIMIT within unioned queries. http://www.postgresql.org/docs/8.2/static/sql-select.html#SQL-UNION If I'm reading this documentation correctly, it *is* possible, as long as they're inside of a sub-select, as in this case. Craig ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] select on 1milion register = 6s
Do you have analyzed your table before doing this ? Le samedi 28 juillet 2007, Bruno Rodrigues Siqueira a écrit : > Friends, > > > > > > > > Who can help me? My SELECT in a base with 1 milion register, > using expression index = 6seconds. > > > > > > Please, I don't know how to makes it better. > > > > > > Thanks -- Hervé Piedvache ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] select on 1milion register = 6s
Bruno Rodrigues Siqueira wrote: Who can help me? My SELECT in a base with 1 milion register, using expression index = 6seconds… Run your query using EXPLAIN ANALYZE SELECT ... your query ... and then post the results to this newsgroup. Nobody can help until they see the results of EXPLAIN ANALYZE. Also, include all other relevant information, such as Postgres version, operating system, amount of memory, and any changes you have made to the Postgres configuration file. Craig ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
[PERFORM] select on 1milion register = 6s
Friends, Who can help me? My SELECT in a base with 1 milion register, using expression index = 6seconds. Please, I don't know how to makes it better. Thanks
Re: [PERFORM] Slow query with backwards index scan
* [EMAIL PROTECTED] <[EMAIL PROTECTED]> [20070728 21:05]: > Let's try putting the sort/limit in each piece of the UNION to speed them up > separately. > > SELECT * FROM ( > (SELECT * FROM large_table lt > WHERE lt.user_id = 12345 > ORDER BY created_at DESC LIMIT 10) AS q1 > UNION > (SELECT * FROM large_table lt > WHERE user_id IN (SELECT contact_id FROM relationships WHERE user_id=12345) > ORDER BY created_at DESC LIMIT 10) AS q2 > ORDER BY created_at DESC LIMIT 10; It's not possible to use ORDER BY or LIMIT within unioned queries. http://www.postgresql.org/docs/8.2/static/sql-select.html#SQL-UNION Would that make sense at all given the way the postgresql planner works? Does that work in other DB's? Til ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] Slow query with backwards index scan
Tilmann Singer <[EMAIL PROTECTED]> wrote .. > * Nis Jørgensen <[EMAIL PROTECTED]> [20070727 20:31]: > > How does the "obvious" UNION query do - ie: > > > > SELECT * FROM ( > > SELECT * FROM large_table lt > > WHERE lt.user_id = 12345 > > > > UNION > > > > SELECT * FROM large_table lt > > WHERE user_id IN (SELECT contact_id FROM relationships WHERE user_id=12345) > > ) q > > > > ORDER BY created_at DESC LIMIT 10; Let's try putting the sort/limit in each piece of the UNION to speed them up separately. SELECT * FROM ( (SELECT * FROM large_table lt WHERE lt.user_id = 12345 ORDER BY created_at DESC LIMIT 10) AS q1 UNION (SELECT * FROM large_table lt WHERE user_id IN (SELECT contact_id FROM relationships WHERE user_id=12345) ORDER BY created_at DESC LIMIT 10) AS q2 ORDER BY created_at DESC LIMIT 10; ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Vacuum looping?
On Fri, Jul 27, 2007 at 05:32:11PM -0400, Steven Flatt wrote: > weren't convinced was doing anything), then start a manual vacuum with a > higher vacuum_cost_limit to get things cleaned up quicker. What are your vacuum_cost_* settings? If you set those too aggressively you'll be in big trouble. The second pass on the vacuum means that maintenance_work_memory isn't large enough. -- Jim Nasby [EMAIL PROTECTED] EnterpriseDB http://enterprisedb.com 512.569.9461 (cell) pgpWfT3kiMjGA.pgp Description: PGP signature
Re: [PERFORM] Vacuum looping?
"Steven Flatt" <[EMAIL PROTECTED]> writes: > The vacuum then just sat there. What I can't understand is why it went back > for a second pass of the pkey index? There was nothing writing to the table > once the vacuum began. Is this behaviour expected? Yes (hint: the numbers tell me what your maintenance_work_mem setting is). You should have left it alone, probably, though there seems to be something funny about your foo_1 index --- why was that so much slower than the others for the first pass? regards, tom lane ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] Slow query with backwards index scan
* Nis Jørgensen <[EMAIL PROTECTED]> [20070727 20:31]: > How does the "obvious" UNION query do - ie: > > SELECT * FROM ( > SELECT * FROM large_table lt > WHERE lt.user_id = 12345 > > UNION > > SELECT * FROM large_table lt > WHERE user_id IN (SELECT contact_id FROM relationships WHERE user_id=12345) > ) q > > ORDER BY created_at DESC LIMIT 10; Great for the user with little data: testdb=# EXPLAIN ANALYZE SELECT * FROM ( SELECT * FROM large_table lt WHERE lt.user_id = 12345 UNION SELECT * FROM large_table lt WHERE user_id IN (SELECT contact_id FROM relationships WHERE user_id=12345) ) q ORDER BY created_at DESC LIMIT 10; QUERY PLAN Limit (cost=14673.77..14673.80 rows=10 width=3140) (actual time=133.877..133.946 rows=10 loops=1) -> Sort (cost=14673.77..14682.22 rows=3378 width=3140) (actual time=133.870..133.894 rows=10 loops=1) Sort Key: q.created_at -> Unique (cost=14315.34..14442.01 rows=3378 width=622) (actual time=133.344..133.705 rows=38 loops=1) -> Sort (cost=14315.34..14323.78 rows=3378 width=622) (actual time=133.337..133.432 rows=38 loops=1) Sort Key: id, user_id, plaze_id, device, started_at, updated_at, status, "type", duration, permission, created_at, mac_address, subnet, msc -> Append (cost=0.00..14117.35 rows=3378 width=622) (actual time=39.144..133.143 rows=38 loops=1) -> Index Scan using large_user_id_started_at_index on large_table lt (cost=0.00..7243.59 rows=2158 width=622) (actual time=39.138..109.831 rows=34 loops=1) Index Cond: (user_id = 12345) -> Nested Loop (cost=42.78..6839.98 rows=1220 width=622) (actual time=14.859..23.112 rows=4 loops=1) -> HashAggregate (cost=42.78..42.79 rows=1 width=4) (actual time=8.092..8.095 rows=1 loops=1) -> Bitmap Heap Scan on relationships (cost=4.34..42.75 rows=11 width=4) (actual time=8.067..8.070 rows=1 loops=1) Recheck Cond: (user_id = 12345) -> Bitmap Index Scan on relationships_user_id_index (cost=0.00..4.34 rows=11 width=0) (actual time=8.057..8.057 rows=1 loops=1) Index Cond: (user_id = 12345) -> Index Scan using large_user_id_started_at_index on large_table lt (cost=0.00..6768.48 rows=2297 width=622) (actual time=6.751..14.970 rows=4 loops=1) Index Cond: (lt.user_id = relationships.contact_id) Total runtime: 134.220 ms Not so great for the user with many early matches: testdb=# EXPLAIN ANALYZE SELECT * FROM ( SELECT * FROM large_table lt WHERE lt.user_id = 5 UNION SELECT * FROM large_table lt WHERE user_id IN (SELECT contact_id FROM relationships WHERE user_id=5) ) q ORDER BY created_at DESC LIMIT 10; QUERY PLAN Limit (cost=14673.77..14673.80 rows=10 width=3140) (actual time=3326.304..3326.367 rows=10 loops=1) -> Sort (cost=14673.77..14682.22 rows=3378 width=3140) (actual time=3326.297..3326.318 rows=10 loops=1) Sort Key: q.created_at -> Unique (cost=14315.34..14442.01 rows=3378 width=622) (actual time=2413.070..3019.385 rows=69757 loops=1) -> Sort (cost=14315.34..14323.78 rows=3378 width=622) (actual time=2413.062..2590.354 rows=69757 loops=1) Sort Key: id, user_id, plaze_id, device, started_at, updated_at, status, "type", duration, permission, created_at, mac_address, subnet, msc -> Append (cost=0.00..14117.35 rows=3378 width=622) (actual time=0.067..1911.626 rows=69757 loops=1) -> Index Scan using large_user_id_started_at_index on large_table lt (cost=0.00..7243.59 rows=2158 width=622) (actual time=0.062..3.440 rows=739 loops=1) Index Cond: (user_id = 5) -> Nested Loop (cost=42.78..6839.98 rows=1220 width=622) (actual time=0.451..1557.901 rows=69018 loops=1) -> HashAggregate (cost=42.78..42.79 rows=1 width=4) (actual time=0.404..0.580 rows=40 loop