Re: [PERFORM] insert waits for delete with trigger

2004-08-10 Thread Litao Wu
Thank you.

How about:

select c.relname, l.pid, l.mode, l.granted,
a.current_query
from pg_locks l, pg_class c, pg_stat_activity a
where
  l.relation = c.oid
  AND l.pid =  a.procpid
order by l.granted, l.pid;


  relname  |  pid  |  
mode   | granted |
  
   current_query

---+---+--+-+---

-
 q_20040810 |   488 | AccessShareLock 
| t   | IDLE
 q_20040810 |   488 | RowExclusiveLock
| t   | IDLE
 q_process  |  3729 | AccessShareLock 
| t   | DELETE FROM q_20040805  WHERE domain_id
='2005761066' AND module='spam'
 q_process  |  3729 | RowExclusiveLock
| t   | DELETE FROM q_20040805  WHERE domain_id
='2005761066' AND module='spam'
 q_20040805 |  3729 | AccessShareLock 
| t   | DELETE FROM q_20040805  WHERE domain_id
='2005761066' AND module='spam'
 q_20040805 |  3729 | RowExclusiveLock
| t   | DELETE FROM q_20040805  WHERE domain_id
='2005761066' AND module='spam'
 q_summary  |  3729 | AccessShareLock 
| t   | DELETE FROM q_20040805  WHERE domain_id
='2005761066' AND module='spam'
 q_summary  |  3729 | RowExclusiveLock
| t   | DELETE FROM q_20040805  WHERE domain_id
='2005761066' AND module='spam'
 q_summary_did_dir_idx  |  3729 | AccessShareLock 
| t   | DELETE FROM q_20040805  WHERE domain_id
='2005761066' AND module='spam'
 pg_shadow |  7660 |
AccessShareLock  | t   | IDLE
 pg_locks  |  7660 |
AccessShareLock  | t   | IDLE
 pg_database   |  7660 |
AccessShareLock  | t   | IDLE
 pg_class  |  7660 |
AccessShareLock  | t   | IDLE
 pg_stat_activity  |  7660 |
AccessShareLock  | t   | IDLE
 pg_class_oid_index|  7660 |
AccessShareLock  | t   | IDLE
 q_process  |  8593 | AccessShareLock 
| t   | DELETE FROM q_20040810  WHERE domain_id
='2002300623' AND module='spam'
 q_process  |  8593 | RowExclusiveLock
| t   | DELETE FROM q_20040810  WHERE domain_id
='2002300623' AND module='spam'
 q_20040810 |  8593 | AccessShareLock 
| t   | DELETE FROM q_20040810  WHERE domain_id
='2002300623' AND module='spam'
 q_20040810 |  8593 | RowExclusiveLock
| t   | DELETE FROM q_20040810  WHERE domain_id
='2002300623' AND module='spam'
 q_summary  |  8593 | AccessShareLock 
| t   | DELETE FROM q_20040810  WHERE domain_id
='2002300623' AND module='spam'
 q_summary  |  8593 | RowExclusiveLock
| t   | DELETE FROM q_20040810  WHERE domain_id
='2002300623' AND module='spam'
 q_summary_did_dir_idx  |  8593 | AccessShareLock 
| t   | DELETE FROM q_20040810  WHERE domain_id
='2002300623' AND module='spam'
 q_process  | 19027 | AccessShareLock 
| t   | INSERT INTO q_process (...) SELECT ...
FROM q_20040805  WHERE domain_id='2005761066' AND
module='spam'
 q_process  | 19027 | RowExclusiveLock
| t   | INSERT INTO q_process (...) SELECT ...
FROM q_20040805  WHERE domain_id='2005761066' AND
module='spam'
 q_20040805 | 19027 | AccessShareLock 
| t   | INSERT INTO q_process (...) SELECT ...
FROM q_20040805  WHERE domain_id='2005761066' AND
module='spam'
 q_did_mod_dir_20040805_idx | 19027 | AccessShareLock 
| t   | INSERT INTO q_process (...) SELECT ...
FROM q_20040805  WHERE domain_id='2005761066' AND
module='spam'
(26 rows)


ps -elfww|grep 19027
040 S postgres 19027   870  1  69   0- 81290
semtim 07:31 ?00:00:51 postgres: postgres mxl
192.168.0.177:38266 INSERT waiting

--- Tom Lane [EMAIL PROTECTED] wrote:

 Litao Wu [EMAIL PROTECTED] writes:
  Did I miss something?
 
 Your join omits all transaction locks.
 
   regards, tom lane
 




__
Do you Yahoo!?
Yahoo! Mail - 50x more storage than other providers!
http://promotions.yahoo.com/new_mail

---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [PERFORM] The black art of postgresql.conf tweaking

2004-08-10 Thread CoL
hi,
Paul Serby wrote:
Can anyone give a good reference site/book for getting the most out of 
your postgres server.

All I can find is contradicting theories on how to work out your settings.
This is what I followed to setup our db server that serves our web 
applications.

http://www.phpbuilder.com/columns/smith20010821.php3?page=2
We have a Dell Poweredge with the following spec.
CPU: Intel(R) Xeon(TM) CPU 3.06GHz (512 KB Cache)
CPU: Intel(R) Xeon(TM) CPU 3.06GHz (512 KB Cache)
CPU: Intel(R) Xeon(TM) CPU 3.06GHz (512 KB Cache)
CPU: Intel(R) Xeon(TM) CPU 3.06GHz (512 KB Cache)
Physical Memory: 2077264 kB
Swap Memory: 2048244 kB
Apache on the Web server can take up to 300 connections and PHP is using 
 pg_pconnect

Postgres is set with the following.
max_connections = 300
shared_buffers = 38400
sort_mem = 12000
But Apache is still maxing out the non-super user connection limit.
The machine is under no load and I would like to up the max_connections 
but I would like to know more about what you need to consider before 
doing so.
One more: In php.ini, set the pgsql.max_persistent lower then 300
; Maximum number of persistent links.  -1 means no limit.
pgsql.max_persistent = -1 - change this
C.
---(end of broadcast)---
TIP 6: Have you searched our list archives?
  http://archives.postgresql.org


[PERFORM] Slow select, insert, update

2004-08-10 Thread Paul Langard
Having trouble with one table (see time to count records below!).

Fairly new to postgres so any help much appreciated.

It only contains 9,106 records - as you can see from:


select count(id) from project

x-tad-biggercount/x-tad-biggerx-tad-bigger
/x-tad-biggerx-tad-bigger9106
1 row(s)
Total runtime: 45,778.813 ms


There are only 3 fields:

id
integer
nextval('id'::text)

projectnumber
text

description
text


There is one index:

id_project_ukey
CREATE UNIQUE INDEX id_project_ukey ON project USING btree (id)

... the database is regularly vaccuumed./x-tad-bigger

Re: [PERFORM] [HACKERS] fsync vs open_sync

2004-08-10 Thread pgsql
 Anyway, with fsync enabled using standard fsync(), I get roughly
 300-400
 inserts per second. With fsync disabled, I get about 7000 inserts per
 second. When I re-enable fsync but use the open_sync option, I can get
 about 2500 inserts per second.

 You are getting 300-400 inserts/sec with fsync on?  If you don't mind me
 asking, what's your hardware?  (also, have you checked fsync on #s with
 the new bgwriter in 7.5?)


300 inserts persecond with fsync on using fdatasync. 2500 inserts per
second with fsync on using open_sync.

[EMAIL PROTECTED] mwoodward]$ cat /proc/cpuinfo
processor   : 0
vendor_id   : GenuineIntel
cpu family  : 15
model   : 2
model name  : Intel(R) Xeon(TM) CPU 2.40GHz
stepping: 5
cpu MHz : 2399.373
cache size  : 512 KB
fdiv_bug: no
hlt_bug : no
f00f_bug: no
coma_bug: no
fpu : yes
fpu_exception   : yes
cpuid level : 2
wp  : yes
flags   : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca
cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe cid
bogomips: 4784.12

Linux node1 2.4.25 #1 Mon Mar 22 13:33:41 EST 2004 i686 i686 i386 GNU/Linux

ide2: BM-DMA at 0xc400-0xc407, BIOS settings: hde:pio, hdf:pio
hde: Maxtor 6Y200P0, ATA DISK drive
hde: attached ide-disk driver.
hde: host protected area = 1
hde: 398297088 sectors (203928 MB) w/7936KiB Cache, CHS=24792/255/63,
UDMA(100)

PDC20268: IDE controller at PCI slot 06:05.0


---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] Slow select, insert, update

2004-08-10 Thread Doug McNaught
Paul Langard [EMAIL PROTECTED] writes:

 Having trouble with one table (see time to count records below!).

 Fairly new to postgres so any help much appreciated.

 It only contains 9,106 records - as you can see from:


 select count(id) from project

 count
 9106
 1 row(s)
 Total runtime: 45,778.813 ms

 ... the database is regularly vaccuumed.

Hmm.  You might try a VACUUM FULL and a REINDEX on the table (you
don't say what version you are running--REINDEX is sometimes needed on
7.3 and below).

Also, use EXPLAIN ANALYZE on your query and post the result--that's
helpful diagnostic information.

-Doug
-- 
Let us cross over the river, and rest under the shade of the trees.
   --T. J. Jackson, 1863


---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [PERFORM] Performance Bottleneck

2004-08-10 Thread matt

 Squid also takes away the work of doing SSL (presuming you're running it
 on a different machine). Unfortunately it doesn't support HTTP/1.1 which
 means that most generated pages (those that don't set Content-length) end
 up forcing squid to close and then reopen the connection to the web
 server.

It is true that it doesn't support http/1.1, but 'most generated pages'? 
Unless they are actually emitted progressively they should have a
perfectly good content-length header.

 I've also had some problems when Squid had a large number of connections
 open (several thousand); though that may have been because of my
 half_closed_clients setting. Squid 3 coped a lot better when I tried it
 (quite a few months ago now - and using FreeBSD and the special kqueue
 system call) but crashed under some (admittedly synthetic) conditions.

It runs out of the box with a very conservative setting for max open file
descriptors - this may or may not be the cause of the problems you have
seen.  Certainly I ran squid with 16,000 connections back in 1999...

 You still have periods of time when the web servers are busy using their
 CPUs to generate HTML rather than waiting for database queries. This is
 especially true if you cache a lot of data somewhere on the web servers
 themselves (which, in my experience, reduces the database load a great
 deal). If you REALLY need to reduce the number of connections (because you
 have a large number of web servers doing a lot of computation, say) then
 it might still be useful.

Aha, a postgres related topic in this thread!  What you say is very true,
but then given that the connection overhead is so vanishingly small, why
not simply run without a persistent DB connection in this case?  I would
maintain that if your webservers are holding open idle DB connections for
so long that it's a problem, then simply close the connections!

M

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] Slow select, insert, update

2004-08-10 Thread Bill Montgomery
Paul,
Paul Langard wrote:
Having trouble with one table (see time to count records below!).
Fairly new to postgres so any help much appreciated.
It only contains 9,106 records - as you can see from:
select count(id) from project
*count
*9106
1 row(s)
Total runtime: 45,778.813 ms
snip
... the database is regularly vaccuumed. 

Have you tried doing a VACUUM FULL, CLUSTER, or drop/restore on the 
table? This sounds symptomatic of a table with a bunch of dead tuples 
not in the FSM (free space map). Only tuples in the FSM are reclaimed by 
a regular VACUUM. If your FSM parameters in postgresql.conf are not big 
enough for your ratio of UPDATE/DELETE operations to VACUUM frequency, 
you will end up with dead tuples that will only be reclaimed by a VACUUM 
FULL.

To prevent this problem in the future, look at increasing your FSM size 
and possibly vacuuming more frequently or using pg_autovacuum.

Good Luck,
Bill Montgomery
---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?
  http://www.postgresql.org/docs/faqs/FAQ.html


Re: [PERFORM] Column order performance

2004-08-10 Thread Bill Montgomery
Josh Berkus wrote:
Does the order of columns of varying size have any effect on
SELECT/INSERT/UPDATE/and/or/DELETE performance? Take the example where
an integer primary key is listed first in the table and alternatively
listed after some large varchar or text columns?
   

No, the order of the columns in the table makes no difference.  They are not 
physically stored in the metadata order, anyway; on the data pages, 
fixed-length fields (e.g. INT, BOOLEAN, etc.) are stored first and 
variable-length fields (CHAR, TEXT, NUMERIC) after them, AFAIK.
 

Is this true even after a table is altered to append say, an integer 
column, after there are already variable-length columns in the table?

-Bill
---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] Slow select, insert, update

2004-08-10 Thread Tom Lane
Paul Langard [EMAIL PROTECTED] writes:
 select count(id) from project
 count
 9106
 1 row(s)
 Total runtime: 45,778.813 ms

Yipes.  The only explanation I can think of is tremendous table bloat.
What do you get from vacuum verbose project --- in particular, how
many pages in the table?

 ... the database is regularly vaccuumed.

Not regularly enough, perhaps ... or else you need to increase the free
space map size parameters.  In any case you'll probably need to do one
round of vacuum full to get this table back within bounds.

regards, tom lane

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] Column order performance

2004-08-10 Thread Josh Berkus
Bill,

 Does the order of columns of varying size have any effect on
 SELECT/INSERT/UPDATE/and/or/DELETE performance? Take the example where
 an integer primary key is listed first in the table and alternatively
 listed after some large varchar or text columns?

No, the order of the columns in the table makes no difference.  They are not 
physically stored in the metadata order, anyway; on the data pages, 
fixed-length fields (e.g. INT, BOOLEAN, etc.) are stored first and 
variable-length fields (CHAR, TEXT, NUMERIC) after them, AFAIK.

The only thing I have seen elusive reports of is that *display* speed can be 
afffected by column order (e.g. when you call the query to the command line 
with many rows) but I've not seen this proven in a test case.

-- 
Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] [HACKERS] fsync vs open_sync

2004-08-10 Thread Josh Berkus
Guys, just so you know:

OSDL did some testing and found Ext3 to be perhaps the worst FS for PostgreSQL 
-- although this testing was with the default options.   Ext3 involved an 
almost 40% write performance penalty compared with Ext2, whereas the penalty 
for ReiserFS and JFS was less than 10%.  

This concurs with my personal experience.

-- 
Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] insert waits for delete with trigger

2004-08-10 Thread Tom Lane
Litao Wu [EMAIL PROTECTED] writes:
 How about:

 select c.relname, l.pid, l.mode, l.granted,
 a.current_query
 from pg_locks l, pg_class c, pg_stat_activity a
 where
   l.relation = c.oid
   AND l.pid =  a.procpid
 order by l.granted, l.pid;

You can't join to pg_class without eliminating the transaction lock rows
(because they have NULLs in the relation field).

regards, tom lane

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] Hardware upgrade for a high-traffic database

2004-08-10 Thread Jason Coene
Hi Rod,

Actually, we're already using a substantial caching system in code for
nearly all pages delivered - we've exhausted that option.  Our system uses a
login/session table for about 1/8 of our page views (those visitors who are
logged in), and has tracking features.  While I'd love to scrap them and
give the database server a vacation, it's a requirement for us.

You're correct about the query caching (stored in memory) being used - most
of our queries are run once and then come from memory (or, based on speed of
consecutive executions, that seems to be the case).  Once a user hits a page
for the first time in an hour or so, it seems to cache their session query.

The issue that I think we're seeing is that the performance on the 3Ware
RAID is quite bad, watching FreeBSD systat will show it at 100% busy at
around 3.5 MB/s.  When it needs to seek across a table (for, say, an
aggregate function - typically a COUNT()), it slows the entire server down
while working on the disk.  Additionally, VACUUM's make the server
practically useless.  We have indexes on everything that's used in queries,
and the planner is using them.

The server has 2GB of physical memory, however it's only uses between 130MB
and 200MB of it.  Postgres is the only application running on the server.

Our pertinent settings look like this:

max_connections = 512

shared_buffers = 2
sort_mem = 2000
vacuum_mem = 2
effective_cache_size = 30

fsync = false
wal_sync_method = fsync
wal_buffers = 32

checkpoint_segments = 2
checkpoint_timeout = 30
commit_delay = 1

Typically, we don't use anywhere near the 512 connections - however there
are peak hours where we come close, and other times that we eclipse it and
run out (should some connections become serialized due to a slowdown).  It's
not something that we can comfortably lower.

The non-standard checkpoint settings have helped making it less likely that
a large (in disk time) query will conflict with a checkpoint write.

I'm a programmer - definitely not a DBA by any stretch - though I am forced
into the role.  From reading this list, it seems to me that our settings are
reasonable given our usage, and that a disk upgrade is likely in order.

I'd love to hear any suggestions.

Thanks,

Jason
 
-Original Message-
From: Rod Taylor [mailto:[EMAIL PROTECTED] 
Sent: Tuesday, August 10, 2004 7:07 PM
To: Jason Coene
Cc: Postgresql Performance
Subject: Re: [PERFORM] Hardware upgrade for a high-traffic database

 Our database is about 20GB on disk, we have some quite large tables - 2M
 rows with TEXT fields in a sample table, accessed constantly.  We average
 about 4,000 - 5,000 queries per second - all from web traffic.  As you can

99% is reads? and probably the same data over and over again? You might
want to think about a small code change to cache sections of page output
in memory for the most commonly generated pages (there are usually 3 or
4 that account for 25% to 50% of web traffic -- starting pages).

The fact you're getting 5k queries/second off IDE drives tells me most
of the active data is in memory -- so your actual working data set is
probably quite small (less than 10% of the 20GB).


If the above is all true (mostly reads, smallish dataset, etc.) and the
database is not growing very quickly, you might want to look into RAM
and RAM bandwidth over disk. An Opteron with 8GB ram using the same old
IDE drives. Get a mobo with a SCSI raid controller in it, so the disk
component can be upgraded in the future (when necessary).




---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] Hardware upgrade for a high-traffic database

2004-08-10 Thread Scott Marlowe
On Tue, 2004-08-10 at 13:17, Jason Coene wrote:
 Hi All,
 
 We're currently running Postgres 7.4.1 on FreeBSD 5.2, a Dual Xeon 2.4, 2GB
 ECC, 3Ware Serial ATA RAID 5 w/ 4 disks (SLOW!!).
 
 Our database is about 20GB on disk, we have some quite large tables - 2M
 rows with TEXT fields in a sample table, accessed constantly.  We average
 about 4,000 - 5,000 queries per second - all from web traffic.  As you can
 imagine, we're quite disk limited and checkpoints can be killer.
 Additionally, we see queries and connections getting serialized due to
 queries that take a long time (5 sec or so) while waiting on disk access.
 No fun at all.
 
 We've tweaked everything long and hard, and at the end of the day, the disk
 is killing us.
 
 We're looking to upgrade our server - or rather, replace it as it has no
 upgrade path to SCSI.  I'm considering going Opteron (though right now we
 don't need more CPU time), and am looking for suggestions on what an optimal
 RAID configuration may look like (disks, controller, cache setting).  We're
 in the market to buy right now - any good vendor suggestions?

I've had very good luck with LSI MegaRAID controllers with battery
backed cache.  The amount of cache doesn't seem as important as having
it, and having it set for write back.

After that, 2 gigs or more of memory is the next improvement.

After that, the speed of the memory.



---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] Hardware upgrade for a high-traffic database

2004-08-10 Thread Rod Taylor
 Our database is about 20GB on disk, we have some quite large tables - 2M
 rows with TEXT fields in a sample table, accessed constantly.  We average
 about 4,000 - 5,000 queries per second - all from web traffic.  As you can

99% is reads? and probably the same data over and over again? You might
want to think about a small code change to cache sections of page output
in memory for the most commonly generated pages (there are usually 3 or
4 that account for 25% to 50% of web traffic -- starting pages).

The fact you're getting 5k queries/second off IDE drives tells me most
of the active data is in memory -- so your actual working data set is
probably quite small (less than 10% of the 20GB).


If the above is all true (mostly reads, smallish dataset, etc.) and the
database is not growing very quickly, you might want to look into RAM
and RAM bandwidth over disk. An Opteron with 8GB ram using the same old
IDE drives. Get a mobo with a SCSI raid controller in it, so the disk
component can be upgraded in the future (when necessary).



---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


[PERFORM] Bulk Insert and Index use

2004-08-10 Thread Rudi Starcevic
Hi,
I have a question on bulk checking, inserting into a table and
how best to use an index for performance.
The data I have to work with is a monthly CD Rom csv data dump of
300,000 property owners from one area/shire.
So every CD has 300,000 odd lines, each line of data which fills the 
'property' table.

Beginning with the first CD each line should require one SELECT and
one INSERT as it will be the first property with this address.
The SELECT uses fields like 'street' and 'suburb', to check for an 
existing property,
so I have built an index on those fields.

My question is does each INSERT rebuild the index on the 'street' and 
'suburb' fields?
I believe it does but I'm asking to be sure.

If this is the case I guess performance will suffer when I have, say, 
200,000
rows in the table.

Would it be like:
a) Use index to search on 'street' and 'suburb'
b) No result? Insert new record
c) Rebuild index on 'street' and 'suburb'
for each row?
Would this mean that after 200,000 rows each INSERT will require
the index of 000's of rows to be re-indexed?
So far I believe my only options are to use either and index
or sequential scan and see which is faster.
A minute for your thoughts and/or suggestions would be great.
Thanks.
Regards,
Rudi.
---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] Slow select, insert, update

2004-08-10 Thread Litao Wu
Does that mean reindex is not needed
for PG version 7.4?

In what kind situations under PG 7.4, 
reindex is worthwhile?

Thanks,
 

Here is doc from 7.3:
PostgreSQL is unable to reuse B-tree index pages in
certain cases. The problem is that if indexed rows are
deleted, those index pages can only be reused by rows
with similar values. For example, if indexed rows are
deleted and newly inserted/updated rows have much
higher values, the new rows can't use the index space
made available by the deleted rows. Instead, such new
rows must be placed on new index pages. In such cases,
disk space used by the index will grow indefinitely,
even if VACUUM is run frequently. 

As a solution, you can use the REINDEX command
periodically to discard pages used by deleted rows.
There is also contrib/reindexdb which can reindex an
entire database. 

The counterpart of 7.4 is:
In some situations it is worthwhile to rebuild indexes
periodically with the REINDEX command. (There is also
contrib/reindexdb which can reindex an entire
database.) However, PostgreSQL 7.4 has substantially
reduced the need for this activity compared to earlier
releases. 


--- Doug McNaught [EMAIL PROTECTED] wrote:

 Paul Langard [EMAIL PROTECTED] writes:
 
  Having trouble with one table (see time to count
 records below!).
 
  Fairly new to postgres so any help much
 appreciated.
 
  It only contains 9,106 records - as you can see
 from:
 
 
  select count(id) from project
 
  count
  9106
  1 row(s)
  Total runtime: 45,778.813 ms
 
  ... the database is regularly vaccuumed.
 
 Hmm.  You might try a VACUUM FULL and a REINDEX on
 the table (you
 don't say what version you are running--REINDEX is
 sometimes needed on
 7.3 and below).
 
 Also, use EXPLAIN ANALYZE on your query and post the
 result--that's
 helpful diagnostic information.
 
 -Doug
 -- 
 Let us cross over the river, and rest under the
 shade of the trees.
--T. J. Jackson, 1863
 
 
 ---(end of
 broadcast)---
 TIP 8: explain analyze is your friend
 




__
Do you Yahoo!?
Read only the mail you want - Yahoo! Mail SpamGuard.
http://promotions.yahoo.com/new_mail 

---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [PERFORM] Bulk Insert and Index use

2004-08-10 Thread Jim J
If the bulk load has the possibility of duplicating data, then you need 
to change methods.  Try bulk loading into a temp table,  index it like 
the original, eliminate the dups and merge the tables.

It is also possible to do an insert from the temp table into the final 
table like:
insert into original (x,x,x)  (select temp.1, temp.2, etc from temp left 
join original on temp.street=original.street where original.street is null)

Good Luck
Jim
Rudi Starcevic wrote:
Hi,
I have a question on bulk checking, inserting into a table and
how best to use an index for performance.
The data I have to work with is a monthly CD Rom csv data dump of
300,000 property owners from one area/shire.
So every CD has 300,000 odd lines, each line of data which fills the 
'property' table.

Beginning with the first CD each line should require one SELECT and
one INSERT as it will be the first property with this address.
The SELECT uses fields like 'street' and 'suburb', to check for an 
existing property,
so I have built an index on those fields.

My question is does each INSERT rebuild the index on the 'street' and 
'suburb' fields?
I believe it does but I'm asking to be sure.

If this is the case I guess performance will suffer when I have, say, 
200,000
rows in the table.

Would it be like:
a) Use index to search on 'street' and 'suburb'
b) No result? Insert new record
c) Rebuild index on 'street' and 'suburb'
for each row?
Would this mean that after 200,000 rows each INSERT will require
the index of 000's of rows to be re-indexed?
So far I believe my only options are to use either and index
or sequential scan and see which is faster.
A minute for your thoughts and/or suggestions would be great.
Thanks.
Regards,
Rudi.
---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?
  http://www.postgresql.org/docs/faqs/FAQ.html


Re: [PERFORM] Bulk Insert and Index use

2004-08-10 Thread Gregory S. Williamson

Usualy any bulk load is faster with indexes dropped and the rebuilt ... failing that 
(like you really need the indexes while loading, say into a hot table) be sure to 
wrap all the SQL into one transaction (BEGIN;...COMMIT;) ... if any data failes it all 
fails, which is usually easier to deal with than partial data loads, and it is *much* 
faster than having each insert being its own transaction.

HTH,

Greg Williamson
DBA
GlobeXplorer LLC

-Original Message-
From:   Rudi Starcevic [mailto:[EMAIL PROTECTED]
Sent:   Tue 8/10/2004 4:04 PM
To: [EMAIL PROTECTED]
Cc: 
Subject:[PERFORM] Bulk Insert and Index use
Hi,

I have a question on bulk checking, inserting into a table and
how best to use an index for performance.

The data I have to work with is a monthly CD Rom csv data dump of
300,000 property owners from one area/shire.

So every CD has 300,000 odd lines, each line of data which fills the 
'property' table.

Beginning with the first CD each line should require one SELECT and
one INSERT as it will be the first property with this address.

The SELECT uses fields like 'street' and 'suburb', to check for an 
existing property,
so I have built an index on those fields.

My question is does each INSERT rebuild the index on the 'street' and 
'suburb' fields?
I believe it does but I'm asking to be sure.

If this is the case I guess performance will suffer when I have, say, 
200,000
rows in the table.

Would it be like:

a) Use index to search on 'street' and 'suburb'
b) No result? Insert new record
c) Rebuild index on 'street' and 'suburb'

for each row?
Would this mean that after 200,000 rows each INSERT will require
the index of 000's of rows to be re-indexed?

So far I believe my only options are to use either and index
or sequential scan and see which is faster.

A minute for your thoughts and/or suggestions would be great.

Thanks.
Regards,
Rudi.


---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster




---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


Re: [PERFORM] Bulk Insert and Index use

2004-08-10 Thread Rudi Starcevic
Hi Jim,
Thanks for your time.
 If the bulk load has the possibility of duplicating data
Yes, each row will require either:
a) One SELECT + One INSERT
or
b) One SELECT + One UPDATE
I did think of using more than one table, ie. temp table.
As each month worth of data is added I expect to see
a change from lots of INSERTS to lots of UPDATES.
Perhaps when the UPDATES become more dominant it would
be best to start using Indexes.
While INSERTS are more prevelant perhaps a seq. scan is better.
I guess of all the options available it boils down to which
is quicker for my data: index or sequential scan.
Many thanks.
Jim J wrote:
If the bulk load has the possibility of duplicating data, then you need 
to change methods.  Try bulk loading into a temp table,  index it like 
the original, eliminate the dups and merge the tables.

It is also possible to do an insert from the temp table into the final 
table like:
insert into original (x,x,x)  (select temp.1, temp.2, etc from temp left 
join original on temp.street=original.street where original.street is null)

Good Luck
Jim
Rudi Starcevic wrote:
Hi,
I have a question on bulk checking, inserting into a table and
how best to use an index for performance.
The data I have to work with is a monthly CD Rom csv data dump of
300,000 property owners from one area/shire.
So every CD has 300,000 odd lines, each line of data which fills the 
'property' table.

Beginning with the first CD each line should require one SELECT and
one INSERT as it will be the first property with this address.
The SELECT uses fields like 'street' and 'suburb', to check for an 
existing property,
so I have built an index on those fields.

My question is does each INSERT rebuild the index on the 'street' and 
'suburb' fields?
I believe it does but I'm asking to be sure.

If this is the case I guess performance will suffer when I have, say, 
200,000
rows in the table.

Would it be like:
a) Use index to search on 'street' and 'suburb'
b) No result? Insert new record
c) Rebuild index on 'street' and 'suburb'
for each row?
Would this mean that after 200,000 rows each INSERT will require
the index of 000's of rows to be re-indexed?
So far I believe my only options are to use either and index
or sequential scan and see which is faster.
A minute for your thoughts and/or suggestions would be great.
Thanks.
Regards,
Rudi.
---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?
  http://www.postgresql.org/docs/faqs/FAQ.html


--
Regards,
Rudi.
Internet Media Productions
---(end of broadcast)---
TIP 6: Have you searched our list archives?
  http://archives.postgresql.org


Re: [PERFORM] Bulk Insert and Index use

2004-08-10 Thread Christopher Browne
In an attempt to throw the authorities off his trail, [EMAIL PROTECTED] (Rudi 
Starcevic) transmitted:
 A minute for your thoughts and/or suggestions would be great.

Could you give a more concrete example?  E.g. - the DDL for the
table(s), most particularly.

At first guess, I think you're worrying about a nonissue.  Each insert
will lead to a _modification_ of the various indices, which costs
_something_, but which is WAY less expensive than creating each index
from scratch.

But perhaps I'm misreading things; DDL for the intended tables and
indices would be real handy.
-- 
output = (cbbrowne @ cbbrowne.com)
http://www.ntlug.org/~cbbrowne/linux.html
Rules  of the  Evil  Overlord #21.  I  will hire  a talented  fashion
designer  to create  original uniforms  for my  Legions of  Terror, as
opposed  to  some cheap  knock-offs  that  make  them look  like  Nazi
stormtroopers, Roman  footsoldiers, or savage Mongol  hordes. All were
eventually  defeated and  I want  my troops  to have  a  more positive
mind-set. http://www.eviloverlord.com/

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] Bulk Insert and Index use

2004-08-10 Thread Gregory S. Williamson
If it has to read a majority (or even a good percentage) of the rows in question a 
sequential scan is probably faster ... and as Jim pointed out, a temp table can often 
be a useful medium for getting speed in a load and then allowing you to clean/alter 
data for a final (easy) push.

G
-Original Message-
From:   Rudi Starcevic [mailto:[EMAIL PROTECTED]
Sent:   Tue 8/10/2004 8:33 PM
To: [EMAIL PROTECTED]
Cc: 
Subject:Re: [PERFORM] Bulk Insert and Index use
Hi Jim,

Thanks for your time.

  If the bulk load has the possibility of duplicating data

Yes, each row will require either:

a) One SELECT + One INSERT
or
b) One SELECT + One UPDATE

I did think of using more than one table, ie. temp table.
As each month worth of data is added I expect to see
a change from lots of INSERTS to lots of UPDATES.

Perhaps when the UPDATES become more dominant it would
be best to start using Indexes.

While INSERTS are more prevelant perhaps a seq. scan is better.

I guess of all the options available it boils down to which
is quicker for my data: index or sequential scan.

Many thanks.

Jim J wrote:

 If the bulk load has the possibility of duplicating data, then you need 
 to change methods.  Try bulk loading into a temp table,  index it like 
 the original, eliminate the dups and merge the tables.
 
 It is also possible to do an insert from the temp table into the final 
 table like:
 insert into original (x,x,x)  (select temp.1, temp.2, etc from temp left 
 join original on temp.street=original.street where original.street is null)
 
 Good Luck
 Jim
 
 Rudi Starcevic wrote:
 
 Hi,

 I have a question on bulk checking, inserting into a table and
 how best to use an index for performance.

 The data I have to work with is a monthly CD Rom csv data dump of
 300,000 property owners from one area/shire.

 So every CD has 300,000 odd lines, each line of data which fills the 
 'property' table.

 Beginning with the first CD each line should require one SELECT and
 one INSERT as it will be the first property with this address.

 The SELECT uses fields like 'street' and 'suburb', to check for an 
 existing property,
 so I have built an index on those fields.

 My question is does each INSERT rebuild the index on the 'street' and 
 'suburb' fields?
 I believe it does but I'm asking to be sure.

 If this is the case I guess performance will suffer when I have, say, 
 200,000
 rows in the table.

 Would it be like:

 a) Use index to search on 'street' and 'suburb'
 b) No result? Insert new record
 c) Rebuild index on 'street' and 'suburb'

 for each row?
 Would this mean that after 200,000 rows each INSERT will require
 the index of 000's of rows to be re-indexed?

 So far I believe my only options are to use either and index
 or sequential scan and see which is faster.

 A minute for your thoughts and/or suggestions would be great.

 Thanks.
 Regards,
 Rudi.


 ---(end of broadcast)---
 TIP 4: Don't 'kill -9' the postmaster


 
 
 ---(end of broadcast)---
 TIP 5: Have you checked our extensive FAQ?
 
   http://www.postgresql.org/docs/faqs/FAQ.html
 
 


-- 


Regards,
Rudi.

Internet Media Productions

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org




---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] Bulk Insert and Index use

2004-08-10 Thread Rudi Starcevic
Hi,
In an attempt to throw the authorities off his trail, [EMAIL PROTECTED] (Rudi 
Starcevic) transmitted:
A minute for your thoughts and/or suggestions would be great.
Heh heh 
Could you give a more concrete example?  E.g. - the DDL for the
table(s), most particularly.
Thanks, I didn't add the DDL as I though it may make my question too
long. I have the DDL at another office so I'll pick up this email
thread when I get there in a couple hours.
At first guess, I think you're worrying about a nonissue.  Each insert
will lead to a _modification_ of the various indices, which costs
_something_, but which is WAY less expensive than creating each index
from scratch.
Very interesting, modification and creation.
I will post another email later today.
Many thanks.
--
Regards,
Rudi.
Internet Media Productions
---(end of broadcast)---
TIP 8: explain analyze is your friend


[PERFORM] NUMERIC x VARCHAR

2004-08-10 Thread Er Galvão Abbott




Greetings.

I have a question regarding performance of certain datatypes:

I have a field where I will store my clients phone numbers. I know that
this field will never exceed 15 characters, and I will store only
numbers here (no dashes, dots, etc...), so I was wondering:

Wich type is faster: NUMERIC(15,0) or VARCHAR(15)? Are there any
storage differences between them?

TIA,

-- 
Er Galvo Abbott
Desenvolvedor Web
http://www.galvao.eti.br/
[EMAIL PROTECTED]