Re: [PERFORM] GiST index performance
hello, there is other performance problem on this request. If you analyse query plan, you see that most of the time are lost during sequencial scan, and you have 2 seq scan. You have to create other indexes to match the request. Postgresq is totally dependant on index to reach is performance. Regarding gist or btree, I personnaly had better performance with btree. Regards david Matthew Wakeling a écrit : I have been doing some queries that are best answered with GiST indexes, however I have found that their performance is a little lacking. I thought I would do a direct comparison on a level playing field. Here are two EXPLAIN ANALYSE results for the same query, with two different indexes. The two indexes are identical except that one is btree and the other GiST. Here is the query: SELECT * FROM location l1, location l2, gene, primer WHERE l1.subjectid <> l2.subjectid AND l1.objectid = l2.objectid AND l1.subjectid = gene.id AND l2.subjectid = primer.id AND l2.intermine_start <= l1.intermine_start AND l2.intermine_end >= l1.intermine_start Here is the btree index: CREATE INDEX location_object_start ON location (objectid, intermine_start); QUERY PLAN -- Hash Join (cost=26213.16..135980894.76 rows=3155740824 width=484) (actual time=2799.260..14256.588 rows=2758 loops=1) Hash Cond: (l1.subjectid = gene.id) -> Nested Loop (cost=0.00..4364485.01 rows=8891802645 width=324) (actual time=9.748..10418.807 rows=390695 loops=1) Join Filter: (l1.subjectid <> l2.subjectid) -> Nested Loop (cost=0.00..446862.58 rows=572239 width=259) (actual time=9.720..4226.117 rows=211880 loops=1) -> Seq Scan on primer (cost=0.00..15358.80 rows=211880 width=194) (actual time=9.678..579.877 rows=211880 loops=1) -> Index Scan using location__key_all on location l2 (cost=0.00..2.00 rows=3 width=65) (actual time=0.004..0.007 rows=1 loops=211880) Index Cond: (l2.subjectid = primer.id) -> Index Scan using location_object_start on location l1 (cost=0.00..3.85 rows=150 width=65) (actual time=0.005..0.012 rows=3 loops=211880) Index Cond: ((l1.objectid = l2.objectid) AND (l2.intermine_start <= l1.intermine_start) AND (l2.intermine_end >= l1.intermine_start)) -> Hash (cost=20496.96..20496.96 rows=457296 width=160) (actual time=2788.698..2788.698 rows=457296 loops=1) -> Seq Scan on gene (cost=0.00..20496.96 rows=457296 width=160) (actual time=0.038..1420.604 rows=457296 loops=1) Total runtime: 14263.846 ms (13 rows) Here is the GiST index: CREATE INDEX location_object_start_gist ON location USING gist (objectid, intermine_start); QUERY PLAN Hash Join (cost=26213.16..136159960.32 rows=3155740824 width=484) (actual time=2576.109..2300486.267 rows=2758 loops=1) Hash Cond: (l1.subjectid = gene.id) -> Nested Loop (cost=0.00..4543550.56 rows=8891802645 width=324) (actual time=366.121..2296668.740 rows=390695 loops=1) Join Filter: (l1.subjectid <> l2.subjectid) -> Nested Loop (cost=0.00..446862.58 rows=572239 width=259) (actual time=362.774..13423.443 rows=211880 loops=1) -> Seq Scan on primer (cost=0.00..15358.80 rows=211880 width=194) (actual time=319.559..1296.907 rows=211880 loops=1) -> Index Scan using location__key_all on location l2 (cost=0.00..2.00 rows=3 width=65) (actual time=0.041..0.045 rows=1 loops=211880) Index Cond: (l2.subjectid = primer.id) -> Index Scan using location_object_start_gist on location l1 (cost=0.00..4.16 rows=150 width=65) (actual time=3.354..10.757 rows=3 loops=211880) Index Cond: ((l1.objectid = l2.objectid) AND (l2.intermine_start <= l1.intermine_start) AND (l2.intermine_end >= l1.intermine_start)) -> Hash (cost=20496.96..20496.96 rows=457296 width=160) (actual time=2157.914..2157.914 rows=457296 loops=1) -> Seq Scan on gene (cost=0.00..20496.96 rows=457296 width=160) (actual time=3.904..1206.907 rows=457296 loops=1) Total runtime: 2300510.674 ms (13 rows) The query plans are identical except in the type of index used, but there is a factor of a few hundred in execute time. Is this the kind of factor that would be expected, or is there something amiss? Is this seen as something that might be improved in the future? Matthew -- Sent via
[PERFORM] NOW vs CURRENT_DATE
Hello every body, I just discover a big not only big huge difference between NOW() and CURRENT_DATE. Did you already know about it and do you know why ? DELETE FROM blacklist where bl_date < (NOW() - interval '2 DAY'); on 6 000 000 of records 699 ms DELETE FROM blacklist where bl_date < (CURRENT_DATE - interval '2 DAY'); on 6 000 000 of records 0.065 ms tx david -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Large number of tables slow insert
hello to all, I've a question regarding the folowing comments. How to estimate vacuum aggressiveness ? It's for me very deficulte to setup the autovaccum setting correctly. It seems for me that it is not enough aggressive, but when I change the settings the autovacuum process is almost always running. So how to setup it, for around 4 insert, update, delete per 5 minutes regards david Scott Marlowe a écrit : On Sat, Aug 23, 2008 at 6:59 PM, Loic Petit <[EMAIL PROTECTED]> wrote: 1 table contains about 5 indexes : timestamp, one for each sensor type - 3, and one for packet counting (measures packet dropping) (I reckon that this is quite heavy, but a least the timestamp and the values are really usefull) But what's the update rate on these indexes and tables? I'm wondering if you're not vacuuming aggresively enough to keep up with bursty update patterns -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] Distant mirroring
Hello, I'm trying to install a solution to permit me to : - Secure the datas, without RAID - Giving ability to increase the potentiality of the database towards the needs. I have read about slony, DRBD, pgpool I don't find the good system to do what I want. I manage for now 50 millions of request per month. I will reach 100 millions in the end of the year I suppose. There is 2 difficulties : 1 - is the storage : to get faster access,it is recommend to use SAS 15 000 tps. But the disk I can get are 149 GO of space. As the database is growing par 1,7 Go per week at the moment, it will reach is maximum in 3 month. I can add 3 disk at least so It can go to 9 month. What to do after, and especially what to do today to prevent it? 2 - The machine will treat more and more simultaneous entrance, so I need to loadbalance those inserts/updates on several machine and to replicate the datas between them. It's not a real problem if the data are asynchrony. I'm migrating to postgresql 8.3.3. Thanks for all your remarks, suggestions and helps David -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Plz Heeeelp! performance settings
Tx for your reply. You mean that RAID use fsync method for keeping data's copy. So you invite me to desactivate fsync to increase the performance ? Desactivating fsync. my second disk will not be uptodate, so if the machine crash, I wont be able to get the server working quickly??? But if I use a second machine to replicate the database, I escape this problem isn't it ? If I understand right, could you tell me how to do desactivate fsync please ? Best regards David Merlin Moncure a écrit : On Wed, Aug 6, 2008 at 6:12 PM, dforum <[EMAIL PROTECTED]> wrote: Hello to all, I have a Linux 2.6.24.2--std-ipv4-64 #3 SMP Tue Feb 12 12:27:47 CET 2008 x86_64 Intel(R) Xeon(R) CPU X5355 @ 2.66GHz GenuineIntel GNU/Linux with 8Gb of memory. Using sata II disk in RAID 1 (I known that is bad, but this is likely your problem...with fsync on (as you have it), you will be lucky to get a couple of hundred transactions/sec out of the database. you are probably just exceeding your operational capabilities of the hardware so you probably need to upgrade or turn off fsync (which means data corruption in event of hard crash). merlin -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] Plz Heeeelp! performance settings
Hello to all, I have a Linux 2.6.24.2--std-ipv4-64 #3 SMP Tue Feb 12 12:27:47 CET 2008 x86_64 Intel(R) Xeon(R) CPU X5355 @ 2.66GHz GenuineIntel GNU/Linux with 8Gb of memory. Using sata II disk in RAID 1 (I known that is bad, but it would change has quickly I can). I have a database of 38Go and take 6Go per week. I have a lot of update and insert, especially in 8 tables. 2 tables are using for temporary storage, so I right something like 15000 request per 2 minutes and empty it into 10 min. I'm making some update or select on tables including more than 20 millions of entrance. I have the following postgresql.conf settings : # - # PostgreSQL configuration file # - # # This file consists of lines of the form: # # name = value # # (The '=' is optional.) White space may be used. Comments are introduced # with '#' anywhere on a line. The complete list of option names and # allowed values can be found in the PostgreSQL documentation. The # commented-out settings shown in this file represent the default values. # # Please note that re-commenting a setting is NOT sufficient to revert it # to the default value, unless you restart the postmaster. # # Any option can also be given as a command line switch to the # postmaster, e.g. 'postmaster -c log_connections=on'. Some options # can be changed at run-time with the 'SET' SQL command. # # This file is read on postmaster startup and when the postmaster # receives a SIGHUP. If you edit the file on a running system, you have # to SIGHUP the postmaster for the changes to take effect, or use # "pg_ctl reload". Some settings, such as listen_addresses, require # a postmaster shutdown and restart to take effect. #--- # FILE LOCATIONS #--- # The default values of these variables are driven from the -D command line # switch or PGDATA environment variable, represented here as ConfigDir. #data_directory = 'ConfigDir' # use data in another directory #hba_file = 'ConfigDir/pg_hba.conf' # host-based authentication file #ident_file = 'ConfigDir/pg_ident.conf' # IDENT configuration file # If external_pid_file is not explicitly set, no extra pid file is written. #external_pid_file = '(none)' # write an extra pid file #--- # CONNECTIONS AND AUTHENTICATION #--- # - Connection Settings - listen_addresses = 'xxx.xxx.xxx.xxx' # what IP address(es) to listen on; # comma-separated list of addresses; # defaults to 'localhost', '*' = all port = 5432 max_connections = 624 # note: increasing max_connections costs ~400 bytes of shared memory per # connection slot, plus lock space (see max_locks_per_transaction). You # might also need to raise shared_buffers to support more connections. #superuser_reserved_connections = 2 #unix_socket_directory = '' #unix_socket_group = '' #unix_socket_permissions = 0777 # octal #bonjour_name = '' # defaults to the computer name # - Security & Authentication - #authentication_timeout = 60# 1-600, in seconds #ssl = off #password_encryption = on #db_user_namespace = off # Kerberos #krb_server_keyfile = '' #krb_srvname = 'postgres' #krb_server_hostname = '' # empty string matches any keytab entry #krb_caseins_users = off # - TCP Keepalives - # see 'man 7 tcp' for details tcp_keepalives_idle = 300 # TCP_KEEPIDLE, in seconds; # 0 selects the system default #tcp_keepalives_interval = 0# TCP_KEEPINTVL, in seconds; # 0 selects the system default #tcp_keepalives_count = 0 # TCP_KEEPCNT; # 0 selects the system default #--- # RESOURCE USAGE (except WAL) #--- # - Memory - shared_buffers = 25 # min 16 or max_connections*2, 8KB each temp_buffers = 500 # min 100, 8KB each max_prepared_transactions = 200 # can be 0 or more # note: increasing max_prepared_transactions costs ~600 bytes of shared memory # per transaction slot, plus lock space (see max_locks_per_transaction). work_mem = 9000 # min 64, size in KB maintenance_work_mem = 5000 # min 1024, size in KB max_stack_depth = 8192 # min 100, size in KB # - Free Space Map - max_fsm_pages = 10 # min max_fsm_relations*16, 6 bytes each max_fsm_relations = 5000