Re: [PERFORM] GiST index performance

2009-04-16 Thread dforum

hello,

there is other performance problem on this request.

If you analyse query plan, you see that most of the time are lost during 
sequencial scan, and you have 2 seq scan.


You have to create other indexes to match the request.

Postgresq is totally dependant on index to reach is performance.

Regarding gist or btree, I personnaly had better performance with btree.

Regards

david

Matthew Wakeling a écrit :


I have been doing some queries that are best answered with GiST 
indexes, however I have found that their performance is a little 
lacking. I thought I would do a direct comparison on a level playing 
field. Here are two EXPLAIN ANALYSE results for the same query, with 
two different indexes. The two indexes are identical except that one 
is btree and the other GiST.


Here is the query:

SELECT *
FROM
location l1,
location l2,
gene,
primer
WHERE
l1.subjectid <> l2.subjectid
AND l1.objectid = l2.objectid
AND l1.subjectid = gene.id
AND l2.subjectid = primer.id
AND l2.intermine_start <= l1.intermine_start
AND l2.intermine_end >= l1.intermine_start

Here is the btree index:

CREATE INDEX location_object_start ON location (objectid, 
intermine_start);


QUERY PLAN
--
 Hash Join
   (cost=26213.16..135980894.76 rows=3155740824 width=484)
   (actual time=2799.260..14256.588 rows=2758 loops=1)
   Hash Cond: (l1.subjectid = gene.id)
   ->  Nested Loop
 (cost=0.00..4364485.01 rows=8891802645 width=324)
 (actual time=9.748..10418.807 rows=390695 loops=1)
 Join Filter: (l1.subjectid <> l2.subjectid)
 ->  Nested Loop
   (cost=0.00..446862.58 rows=572239 width=259)
   (actual time=9.720..4226.117 rows=211880 loops=1)
   ->  Seq Scan on primer
 (cost=0.00..15358.80 rows=211880 width=194)
 (actual time=9.678..579.877 rows=211880 loops=1)
   ->  Index Scan using location__key_all on location l2
 (cost=0.00..2.00 rows=3 width=65)
 (actual time=0.004..0.007 rows=1 loops=211880)
 Index Cond: (l2.subjectid = primer.id)
 ->  Index Scan using location_object_start on location l1
   (cost=0.00..3.85 rows=150 width=65)
   (actual time=0.005..0.012 rows=3 loops=211880)
   Index Cond: ((l1.objectid = l2.objectid) AND 
(l2.intermine_start <= l1.intermine_start) AND (l2.intermine_end >= 
l1.intermine_start))

   ->  Hash
 (cost=20496.96..20496.96 rows=457296 width=160)
 (actual time=2788.698..2788.698 rows=457296 loops=1)
 ->  Seq Scan on gene
   (cost=0.00..20496.96 rows=457296 width=160)
   (actual time=0.038..1420.604 rows=457296 loops=1)
 Total runtime: 14263.846 ms
(13 rows)


Here is the GiST index:

CREATE INDEX location_object_start_gist ON location USING gist 
(objectid, intermine_start);


QUERY PLAN

 Hash Join
   (cost=26213.16..136159960.32 rows=3155740824 width=484)
   (actual time=2576.109..2300486.267 rows=2758 loops=1)
   Hash Cond: (l1.subjectid = gene.id)
   ->  Nested Loop
 (cost=0.00..4543550.56 rows=8891802645 width=324)
 (actual time=366.121..2296668.740 rows=390695 loops=1)
 Join Filter: (l1.subjectid <> l2.subjectid)
 ->  Nested Loop
   (cost=0.00..446862.58 rows=572239 width=259)
   (actual time=362.774..13423.443 rows=211880 loops=1)
   ->  Seq Scan on primer
 (cost=0.00..15358.80 rows=211880 width=194)
 (actual time=319.559..1296.907 rows=211880 loops=1)
   ->  Index Scan using location__key_all on location l2
 (cost=0.00..2.00 rows=3 width=65)
 (actual time=0.041..0.045 rows=1 loops=211880)
 Index Cond: (l2.subjectid = primer.id)
 ->  Index Scan using location_object_start_gist on location l1
   (cost=0.00..4.16 rows=150 width=65)
   (actual time=3.354..10.757 rows=3 loops=211880)
   Index Cond: ((l1.objectid = l2.objectid) AND 
(l2.intermine_start <= l1.intermine_start) AND (l2.intermine_end >= 
l1.intermine_start))

   ->  Hash
 (cost=20496.96..20496.96 rows=457296 width=160)
 (actual time=2157.914..2157.914 rows=457296 loops=1)
 ->  Seq Scan on gene
   (cost=0.00..20496.96 rows=457296 width=160)
   (actual time=3.904..1206.907 rows=457296 loops=1)
 Total runtime: 2300510.674 ms
(13 rows)

The query plans are identical except in the type of index used, but 
there is a factor of a few hundred in execute time. Is this the kind 
of factor that would be expected, or is there something amiss? Is this 
seen as something that might be improved in the future?


Matthew




--
Sent via 

[PERFORM] NOW vs CURRENT_DATE

2008-08-24 Thread dforum

Hello every body,

I just discover a big not only big huge difference between NOW() and 
CURRENT_DATE.


Did you already know about it and do you know why ?

DELETE FROM blacklist where bl_date < (NOW() - interval '2 DAY');
on 6 000 000 of records
699 ms

DELETE FROM blacklist where bl_date < (CURRENT_DATE - interval '2 DAY');
on 6 000 000 of records

0.065 ms

tx

david

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Large number of tables slow insert

2008-08-24 Thread dforum

hello to all,

I've a question regarding the folowing comments.

How to estimate vacuum aggressiveness ?

It's for me very deficulte to setup the autovaccum setting correctly. It 
seems for me that it is not enough aggressive, but when I change the 
settings the autovacuum process is almost always running.


So how to setup it, for around 4 insert, update, delete per 5 minutes

regards

david
Scott Marlowe a écrit :

On Sat, Aug 23, 2008 at 6:59 PM, Loic Petit <[EMAIL PROTECTED]> wrote:
  

1 table contains about 5 indexes : timestamp, one for each sensor type - 3,
and one for packet counting (measures packet dropping)
(I reckon that this is quite heavy, but a least the timestamp and the values
are really usefull)



But what's the update rate on these indexes and tables?  I'm wondering
if you're not vacuuming aggresively enough to keep up with bursty
update patterns

  



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Distant mirroring

2008-08-09 Thread dforum

Hello,

I'm trying to install a solution to permit me to :
- Secure the datas, without RAID
- Giving ability to increase the potentiality of the database towards 
the needs.


I have read about slony, DRBD, pgpool

I don't find the good system to do what I want.

I manage for now 50 millions of request per month.

I will reach 100 millions in the end of the year I suppose.

There is 2 difficulties :
1 - is the storage : to get faster access,it is recommend to use SAS 15 
000 tps. But the disk I can get are 149 GO of space. As the database is 
growing par 1,7 Go per week at the moment, it will reach is maximum in 3 
month. I can add 3 disk at least so It can go to 9 month. What to do 
after, and especially what to do today to prevent it?
2 - The machine will treat more and more simultaneous entrance, so I 
need to loadbalance those inserts/updates on several machine and to 
replicate the datas between them. It's not a real problem if the data 
are asynchrony.


I'm migrating to postgresql 8.3.3.

Thanks for all your remarks, suggestions and helps

David

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Plz Heeeelp! performance settings

2008-08-07 Thread dforum

Tx for your reply.

You mean that RAID use fsync method for keeping data's copy.

So you invite me to desactivate fsync to increase the performance ?

Desactivating fsync. my second disk will not be uptodate, so if the 
machine crash, I wont be able to get the server working quickly??? But 
if I use a second machine to replicate the database, I escape this 
problem isn't it ?


If I understand right, could you tell me how to do desactivate fsync 
please ?


Best regards

David

Merlin Moncure a écrit :

On Wed, Aug 6, 2008 at 6:12 PM, dforum <[EMAIL PROTECTED]> wrote:
  

Hello to all,


I have a Linux 2.6.24.2--std-ipv4-64 #3 SMP Tue Feb 12 12:27:47 CET 2008
x86_64 Intel(R) Xeon(R) CPU X5355 @ 2.66GHz GenuineIntel GNU/Linux
with 8Gb of memory. Using sata II disk in RAID 1 (I known that is bad, but




this is likely your problem...with fsync on (as you have it), you will
be lucky to get a couple of hundred transactions/sec out of the
database.  you are probably just exceeding your operational
capabilities of the hardware so you probably need to upgrade or turn
off fsync (which means data corruption in event of hard crash).

merlin

  



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Plz Heeeelp! performance settings

2008-08-06 Thread dforum

Hello to all,


I have a Linux 2.6.24.2--std-ipv4-64 #3 SMP Tue Feb 12 12:27:47 CET 
2008 x86_64 Intel(R) Xeon(R) CPU X5355 @ 2.66GHz GenuineIntel GNU/Linux
with 8Gb of memory. Using sata II disk in RAID 1 (I known that is bad, 
but it would change has quickly I can).


I have a database of 38Go and take 6Go per week.

I have a lot of update and insert, especially in 8 tables. 2 tables are 
using for temporary storage, so I right something like 15000 request per 
2 minutes and empty it into 10 min.
I'm making some update or select on tables including more than 20 
millions of entrance.


I have the following postgresql.conf settings :


# -
# PostgreSQL configuration file
# -
#
# This file consists of lines of the form:
#
#   name = value
#
# (The '=' is optional.) White space may be used. Comments are introduced
# with '#' anywhere on a line. The complete list of option names and
# allowed values can be found in the PostgreSQL documentation. The
# commented-out settings shown in this file represent the default values.
#
# Please note that re-commenting a setting is NOT sufficient to revert it
# to the default value, unless you restart the postmaster.
#
# Any option can also be given as a command line switch to the
# postmaster, e.g. 'postmaster -c log_connections=on'. Some options
# can be changed at run-time with the 'SET' SQL command.
#
# This file is read on postmaster startup and when the postmaster
# receives a SIGHUP. If you edit the file on a running system, you have
# to SIGHUP the postmaster for the changes to take effect, or use
# "pg_ctl reload". Some settings, such as listen_addresses, require
# a postmaster shutdown and restart to take effect.


#---
# FILE LOCATIONS
#---

# The default values of these variables are driven from the -D command line
# switch or PGDATA environment variable, represented here as ConfigDir.

#data_directory = 'ConfigDir'   # use data in another directory
#hba_file = 'ConfigDir/pg_hba.conf' # host-based authentication file
#ident_file = 'ConfigDir/pg_ident.conf' # IDENT configuration file

# If external_pid_file is not explicitly set, no extra pid file is written.
#external_pid_file = '(none)'   # write an extra pid file


#---
# CONNECTIONS AND AUTHENTICATION
#---

# - Connection Settings -

listen_addresses = 'xxx.xxx.xxx.xxx'  # what IP address(es) 
to listen on;

   # comma-separated list of addresses;
   # defaults to 'localhost', '*' = all
port = 5432
max_connections = 624
# note: increasing max_connections costs ~400 bytes of shared memory per
# connection slot, plus lock space (see max_locks_per_transaction).  You
# might also need to raise shared_buffers to support more connections.
#superuser_reserved_connections = 2
#unix_socket_directory = ''
#unix_socket_group = ''
#unix_socket_permissions = 0777 # octal
#bonjour_name = ''  # defaults to the computer name

# - Security & Authentication -

#authentication_timeout = 60# 1-600, in seconds
#ssl = off
#password_encryption = on
#db_user_namespace = off

# Kerberos
#krb_server_keyfile = ''
#krb_srvname = 'postgres'
#krb_server_hostname = ''   # empty string matches any 
keytab entry

#krb_caseins_users = off

# - TCP Keepalives -
# see 'man 7 tcp' for details

tcp_keepalives_idle = 300   # TCP_KEEPIDLE, in seconds;
   # 0 selects the system default
#tcp_keepalives_interval = 0# TCP_KEEPINTVL, in seconds;
   # 0 selects the system default
#tcp_keepalives_count = 0   # TCP_KEEPCNT;
   # 0 selects the system default


#---
# RESOURCE USAGE (except WAL)
#---

# - Memory -

shared_buffers = 25 # min 16 or max_connections*2, 
8KB each

temp_buffers = 500  # min 100, 8KB each
max_prepared_transactions = 200 # can be 0 or more
# note: increasing max_prepared_transactions costs ~600 bytes of shared 
memory

# per transaction slot, plus lock space (see max_locks_per_transaction).
work_mem = 9000 # min 64, size in KB
maintenance_work_mem = 5000 # min 1024, size in KB
max_stack_depth = 8192  # min 100, size in KB

# - Free Space Map -

max_fsm_pages = 10  # min max_fsm_relations*16, 6 
bytes each

max_fsm_relations = 5000