[PERFORM] size of cache

2005-07-13 Thread Jean-Max Reymond
with my application, it seems that size of cache has great effect:
from 512 Kb of L2 cache to 1Mb boost performance with a factor 3 and
20% again from 1Mb L2 cache to 2Mb L2 cache.
I don't understand why a 512Kb cache L2 is too small to fit the data's
does it exist a tool to trace processor activity and confirm that
processor is waiting for memory ?
does it exist a tool to snapshot postgres activity and understand
where we spend time and potentialy avoid the bottleneck ?

thanks for your tips.

-- 
Jean-Max Reymond
CKR Solutions Open Source
Nice France
http://www.ckr-solutions.com

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


[PERFORM] large table vs multiple smal tables

2005-07-13 Thread Nicolas Beaume

Hello

I have a large database with 4 large tables (each containing at least 
200 000 rows, perhaps even 1 or 2 million) and i ask myself if it's 
better to split them into small tables (e.g tables of 2000 rows) to 
speed the access and the update of those tables (considering that i will 
have few update but a lot of reading).


Do you think it would be efficient ?

Nicolas, wondering if he hadn't be too greedy

--

-
« soyez ce que vous voudriez avoir l'air d'être » Lewis Caroll


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] General DB Tuning

2005-07-13 Thread Mark Rae
On Wed, Jul 13, 2005 at 09:52:20AM +0800, Christopher Kings-Lynne wrote:
 The 8.0.2 jdbc driver uses real prepared statements instead of faked 
 ones.  The problem is the new protocol (that the 8.0.2 driver users) has 
 a bug where protocol-prepared queries don't get logged properly.
 I don't know if it's been fixed...

It's not in 8.0.3, but I was having the same problems with DBD::Pg so
I backported some of it and also changed the code so that it listed the
values of the bind parameters, so you get something like

LOG:  statement: SELECT sr.name,sr.seq_region_id, sr.length, 1 FROM seq_region 
sr  WHERE sr.name = $1 AND sr.coord_system_id = $2
LOG:  binding: dbdpg_2 with 2 parameters
LOG:  bind dbdpg_2 $1 = 20
LOG:  bind dbdpg_2 $2 = 1
LOG:  statement: EXECUTE   [PREPARE:  SELECT sr.name,sr.seq_region_id, 
sr.length, 1 FROM seq_region sr  WHERE sr.name = $1 AND sr.coord_system_id = $2]
LOG:  duration: 0.164 ms

I've attached a patch in case anyone finds it useful.

-Mark
*** postgresql-8.0.3/src/backend/tcop/postgres.c2005-07-13 
09:42:04.997669193 +0100
--- postgresql-8.0.3/src/backend/tcop/postgres.c2005-07-13 
09:34:24.618195580 +0100
***
*** 1370,1375 
--- 1370,1378 
else
portal = CreatePortal(portal_name, false, false);
  
+   if (log_statement == LOGSTMT_ALL)
+ ereport(LOG, (errmsg(binding: \%s\ with %d parameters, stmt_name, 
numParams)));
+ 
/*
 * Fetch parameters, if any, and store in the portal's memory context.
 *
***
*** 1428,1433 
--- 1431,1439 
 * grotty but is a big win when dealing 
with very
 * large parameter strings.
 */
+   if (log_statement == LOGSTMT_ALL)
+ ereport(LOG, (errmsg(bind \%s\ $%d 
= \%s\, stmt_name, i+1, pvalue)));
+ 
pbuf.data = (char *) pvalue;
pbuf.maxlen = plength + 1;
pbuf.len = plength;
***
*** 1578,1583 
--- 1584,1593 
boolis_trans_exit = false;
boolcompleted;
charcompletionTag[COMPLETION_TAG_BUFSIZE];
+   struct timeval start_t, stop_t;
+   boolsave_log_duration = log_duration;
+   int save_log_min_duration_statement = 
log_min_duration_statement;
+   boolsave_log_statement_stats = log_statement_stats;
  
/* Adjust destination to tell printtup.c what to do */
dest = whereToSendOutput;
***
*** 1614,1619 
--- 1624,1647 
  
set_ps_display(portal-commandTag);
  
+   /*
+* We use save_log_* so SET log_duration = true  and SET
+* log_min_duration_statement = true don't report incorrect time
+* because gettimeofday() wasn't called. Similarly,
+* log_statement_stats has to be captured once.
+*/
+   if (save_log_duration || save_log_min_duration_statement != -1)
+   gettimeofday(start_t, NULL);
+ 
+   if (save_log_statement_stats)
+   ResetUsage();
+ 
+   if (log_statement == LOGSTMT_ALL)
+   /* We have the portal, so output the source query. */
+   ereport(LOG,
+   (errmsg(statement: EXECUTE %s  [PREPARE:  
%s], portal_name,
+   portal-sourceText ? 
portal-sourceText : )));
+ 
BeginCommand(portal-commandTag, dest);
  
/* Check for transaction-control commands */
***
*** 1708,1713 
--- 1736,1785 
pq_putemptymessage('s');
}
  
+   /*
+* Combine processing here as we need to calculate the query duration
+* in both instances.
+*/
+   if (save_log_duration || save_log_min_duration_statement != -1)
+   {
+   longusecs;
+ 
+   gettimeofday(stop_t, NULL);
+   if (stop_t.tv_usec  start_t.tv_usec)
+   {
+   stop_t.tv_sec--;
+   stop_t.tv_usec += 100;
+   }
+   usecs = (long) (stop_t.tv_sec - start_t.tv_sec) * 100 +
+   (long) (stop_t.tv_usec - start_t.tv_usec);
+ 
+   /* Only print duration if we previously printed the statement. 
*/
+   if (log_statement == LOGSTMT_ALL  save_log_duration)
+   ereport(LOG,
+   (errmsg(duration: %ld.%03ld ms,
+   (long) ((stop_t.tv_sec - 
start_t.tv_sec) * 1000 +
+ (stop_t.tv_usec - 
start_t.tv_usec) / 1000),
+   

[PERFORM] (pas de sujet)

2005-07-13 Thread Nicolas Beaume

Nicolas,

These sizes would not be considered large. I would leave them
as single tables.

Ken




ok, i though it was large but i must confess i'm relatively new in the 
database word. thank you for the answer.


Just another question : what is the maximal number of rows that can be 
contain in a cursor ?


Nicolas, having a lot of things to learn

--

-
« soyez ce que vous voudriez avoir l'air d'être » Lewis Caroll


---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] cost-based vacuum

2005-07-13 Thread Simon Riggs
On Tue, 2005-07-12 at 13:50 -0400, Ian Westmacott wrote:
 It appears not to matter whether it is one of the tables
 being written to that is ANALYZEd.  I can ANALYZE an old,
 quiescent table, or a system table and see this effect.

Can you confirm that this effect is still seen even when the ANALYZE
doesn't touch *any* of the tables being accessed?

 - this is a dual Xeon. 

Is that Xeon MP then?

 - Looking at oprofile reports for 10-minute runs of a
   database-wide VACUUM with vacuum_cost_delay=0 and 1000,
   shows the latter spending a lot of time in LWLockAcquire
   and LWLockRelease (20% each vs. 2%).

Is this associated with high context switching also?

Best Regards, Simon Riggs


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PERFORM] size of cache

2005-07-13 Thread Simon Riggs
On Wed, 2005-07-13 at 10:20 +0200, Jean-Max Reymond wrote:
 with my application, it seems that size of cache has great effect:
 from 512 Kb of L2 cache to 1Mb boost performance with a factor 3 and
 20% again from 1Mb L2 cache to 2Mb L2 cache.

Memory request time is the main bottleneck in well tuned database
systems, so your results could be reasonable. 

 I don't understand why a 512Kb cache L2 is too small to fit the data's
 does it exist a tool to trace processor activity and confirm that
 processor is waiting for memory ?

You have both data and instruction cache on the CPU. It is likely it is
the instruction cache that is too small to fit all of the code required
for your application's workload mix.

Use Intel VTune or similar to show the results you seek. 

Best Regards, Simon Riggs


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM] General DB Tuning

2005-07-13 Thread Simon Riggs
On Wed, 2005-07-13 at 09:52 +0800, Christopher Kings-Lynne wrote:
  Is there a different kind of 'prepared' statements
  that we should be using in the driver to get logging
  to work properly?  What is the 'new' protocol?
 
 The 8.0.2 jdbc driver uses real prepared statements instead of faked 
 ones.  The problem is the new protocol (that the 8.0.2 driver users) has 
 a bug where protocol-prepared queries don't get logged properly.
 
 I don't know if it's been fixed...

Yes, there is a fix for this in 8.1

Brent has been sent the details.

Best Regards, Simon Riggs


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PERFORM] cost-based vacuum

2005-07-13 Thread Ian Westmacott
On Wed, 2005-07-13 at 11:55, Simon Riggs wrote:
 On Tue, 2005-07-12 at 13:50 -0400, Ian Westmacott wrote:
  It appears not to matter whether it is one of the tables
  being written to that is ANALYZEd.  I can ANALYZE an old,
  quiescent table, or a system table and see this effect.
 
 Can you confirm that this effect is still seen even when the ANALYZE
 doesn't touch *any* of the tables being accessed?

Yes.

  - this is a dual Xeon. 
 
 Is that Xeon MP then?

Yes.

  - Looking at oprofile reports for 10-minute runs of a
database-wide VACUUM with vacuum_cost_delay=0 and 1000,
shows the latter spending a lot of time in LWLockAcquire
and LWLockRelease (20% each vs. 2%).
 
 Is this associated with high context switching also?

Yes, it appears that context switches increase up to 4-5x
during cost-based ANALYZE.

--Ian



---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


[PERFORM] Quad Opteron stuck in the mud

2005-07-13 Thread Dan Harris

Gurus,

A table in one of my databases has just crossed the 30 million row  
mark and has begun to feel very sluggish for just about anything I do  
with it.  I keep the entire database vacuumed regularly.  And, as  
long as I'm not doing a sequential scan, things seem reasonably quick  
most of the time.  I'm now thinking that my problem is IO because  
anything that involves heavy ( like a seq scan ) IO seems to slow to  
a crawl.  Even if I am using indexed fields to grab a few thousand  
rows, then going to sequential scans it gets very very slow.


I have also had the occurrence where queries will not finish for days  
( I eventually have to kill them ).  I was hoping to provide an  
explain analyze for them, but if they never finish... even the  
explain never finishes when I try that.


For example, as I'm writing this, I am running an UPDATE statement  
that will affect a small part of the table, and is querying on an  
indexed boolean field.


I have been waiting for over an hour and a half as I write this and  
it still hasn't finished.  I'm thinking I bet Tom, Simon or Josh  
wouldn't put up with this kind of wait time.., so I thought I would  
see if anyone here had some pointers.  Maybe I have a really stupid  
setting in my conf file that is causing this.  I really can't believe  
I am at the limits of this hardware, however.



The query:
update eventactivity set ftindex = false where ftindex = true;  
( added the where clause because I don't want to alter where ftindex  
is null )




The table:
  Column|Type | Modifiers
-+-+---
entrydate   | timestamp without time zone |
incidentid  | character varying(40)   |
statustype  | character varying(20)   |
unitid  | character varying(20)   |
recordtext  | character varying(255)  |
recordtext2 | character varying(255)  |
insertdate  | timestamp without time zone |
ftindex | boolean |
Indexes: eventactivity1 btree (incidentid),
 eventactivity_entrydate_idx btree (entrydate),
 eventactivity_ftindex_idx btree (ftindex),
 eventactivity_oid_idx btree (oid)




The hardware:

4 x 2.2GHz Opterons
12 GB of RAM
4x10k 73GB Ultra320 SCSI drives in RAID 0+1
1GB hardware cache memory on the RAID controller

The OS:
Fedora, kernel 2.6.6-1.435.2.3smp ( redhat stock kernel )
filesystem is mounted as ext2

#

vmstat output ( as I am waiting for this to finish ):
procs ---memory-- ---swap-- -io --system--  
cpu
r  b   swpd   free   buff  cache   si   sobibo   incs us  
sy id wa
0  1   5436 2823908  26140 918370401  2211   540  694   336   
9  2 76 13


#

iostat output ( as I am waiting for this to finish ):
avg-cpu:  %user   %nice%sys %iowait   %idle
   9.190.002.19   13.08   75.53

Device:tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
cciss/c0d0  329.26 17686.03  4317.57  161788630   39496378


#
This is a dedicated postgresql server, so maybe some of these  
settings are more liberal than they should be?


relevant ( I hope ) postgresql.conf options are:

shared_buffers = 5
effective_cache_size = 1348000
random_page_cost = 3
work_mem = 512000
max_fsm_pages = 8
log_min_duration_statement = 6
fsync = true ( not sure if I'm daring enough to run without this )
wal_buffers = 1000
checkpoint_segments = 64
checkpoint_timeout = 3000


# FOR PG_AUTOVACUUM --#
stats_command_string = true
stats_row_level = true

Thanks in advance,
Dan








---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] cost-based vacuum

2005-07-13 Thread Tom Lane
Ian Westmacott [EMAIL PROTECTED] writes:
 On Wed, 2005-07-13 at 11:55, Simon Riggs wrote:
 On Tue, 2005-07-12 at 13:50 -0400, Ian Westmacott wrote:
 It appears not to matter whether it is one of the tables
 being written to that is ANALYZEd.  I can ANALYZE an old,
 quiescent table, or a system table and see this effect.
 
 Can you confirm that this effect is still seen even when the ANALYZE
 doesn't touch *any* of the tables being accessed?

 Yes.

This really isn't making any sense at all.  I took another look through
the vacuum_delay_point() calls, and I can see a couple that are
questionably placed:

* the one in count_nondeletable_pages() is done while we are holding
exclusive lock on the table; we might be better off not to delay there,
so as not to block non-VACUUM processes longer than we have to.

* the ones in hashbulkdelete and rtbulkdelete are done while holding
various forms of exclusive locks on the index (this was formerly true
of gistbulkdelete as well).  Again it might be better not to delay.

However, these certainly do not explain Ian's problem, because (a) these
only apply to VACUUM, not ANALYZE; (b) they would only lock the table
being VACUUMed, not other ones; (c) if these locks were to block the
reader or writer thread, it'd manifest as blocking on a semaphore, not
as a surge in LWLock thrashing.

 Is that Xeon MP then?

 Yes.

The LWLock activity is certainly suggestive of prior reports of
excessive buffer manager lock contention, but it makes *no* sense that
that would be higher with vacuum cost delay than without.  I'd have
expected the other way around.

I'd really like to see a test case for this...

regards, tom lane

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] Quad Opteron stuck in the mud

2005-07-13 Thread Dan Harris


On Jul 13, 2005, at 1:11 PM, John A Meinel wrote:



I might be wrong, but there may be something much more substantially
wrong than slow i/o.
John



Yes, I'm afraid of that too.  I just don't know what tools I should  
use to figure that out.  I have some 20 other databases on this  
system, same schema but varying sizes, and the small ones perform  
very well.  It feels like there is an O(n) increase in wait time that  
has recently become very noticeable on the largest of them.


-Dan

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [PERFORM] Quad Opteron stuck in the mud

2005-07-13 Thread Stephen Frost
* Dan Harris ([EMAIL PROTECTED]) wrote:
 On Jul 13, 2005, at 1:11 PM, John A Meinel wrote:
 I might be wrong, but there may be something much more substantially
 wrong than slow i/o.
 
 Yes, I'm afraid of that too.  I just don't know what tools I should  
 use to figure that out.  I have some 20 other databases on this  
 system, same schema but varying sizes, and the small ones perform  
 very well.  It feels like there is an O(n) increase in wait time that  
 has recently become very noticeable on the largest of them.

Could you come up w/ a test case that others could reproduce where
explain isn't returning?  I think that would be very useful towards
solving at least that issue...

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [PERFORM] Quad Opteron stuck in the mud

2005-07-13 Thread Alvaro Herrera
On Wed, Jul 13, 2005 at 01:16:25PM -0600, Dan Harris wrote:

 On Jul 13, 2005, at 1:11 PM, John A Meinel wrote:
 
 I might be wrong, but there may be something much more substantially
 wrong than slow i/o.
 
 Yes, I'm afraid of that too.  I just don't know what tools I should  
 use to figure that out.  I have some 20 other databases on this  
 system, same schema but varying sizes, and the small ones perform  
 very well.  It feels like there is an O(n) increase in wait time that  
 has recently become very noticeable on the largest of them.

I'd guess it's stuck on some lock.  Try that EXPLAIN, and when it
blocks, watch the pg_locks view for locks not granted to the process
executing the EXPLAIN.  Then check what else is holding the locks.

-- 
Alvaro Herrera (alvherre[a]alvh.no-ip.org)
La rebeldía es la virtud original del hombre (Arthur Schopenhauer)

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM] Quad Opteron stuck in the mud

2005-07-13 Thread Dan Harris


On Jul 13, 2005, at 2:17 PM, Stephen Frost wrote:


Could you come up w/ a test case that others could reproduce where
explain isn't returning?


This was simply due to my n00bness :)  I had always been doing  
explain analyze, instead of just explain.  Next time one of these  
queries comes up, I will be sure to do the explain without analyze.


FYI that update query I mentioned in the initial thread just finished  
after updating 8.3 million rows.


-Dan


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [PERFORM] cost-based vacuum

2005-07-13 Thread Simon Riggs
On Wed, 2005-07-13 at 14:58 -0400, Tom Lane wrote:
 Ian Westmacott [EMAIL PROTECTED] writes:
  On Wed, 2005-07-13 at 11:55, Simon Riggs wrote:
  On Tue, 2005-07-12 at 13:50 -0400, Ian Westmacott wrote:
  It appears not to matter whether it is one of the tables
  being written to that is ANALYZEd.  I can ANALYZE an old,
  quiescent table, or a system table and see this effect.
  
  Can you confirm that this effect is still seen even when the ANALYZE
  doesn't touch *any* of the tables being accessed?
 
  Yes.
 
 This really isn't making any sense at all. 

Agreed. I think all of this indicates that some wierdness (technical
term) is happening at a different level in the computing stack. I think
all of this points fairly strongly to it *not* being a PostgreSQL
algorithm problem, i.e. if the code was executed by an idealised Knuth-
like CPU then we would not get this problem. Plus, I have faith that if
it was a problem in that plane then you or another would have
uncovered it by now.

 However, these certainly do not explain Ian's problem, because (a) these
 only apply to VACUUM, not ANALYZE; (b) they would only lock the table
 being VACUUMed, not other ones; (c) if these locks were to block the
 reader or writer thread, it'd manifest as blocking on a semaphore, not
 as a surge in LWLock thrashing.

I've seen enough circumstantial evidence to connect the time spent
inside LWLockAcquire/Release as being connected to the Semaphore ops
within them, not the other aspects of the code.

Months ago we discussed the problem of false sharing on closely packed
arrays of shared variables because of the large cache line size of the
Xeon MP. When last we touched on that thought, I focused on the thought
that the LWLock array was too tightly packed for the predefined locks.
What we didn't discuss (because I was too focused on the other array)
was the PGPROC shared array is equally tightly packed, which could give
problems on the semaphores in LWLock.

Intel says fairly clearly that this would be an issue. 

  Is that Xeon MP then?
 
  Yes.
 
 The LWLock activity is certainly suggestive of prior reports of
 excessive buffer manager lock contention, but it makes *no* sense that
 that would be higher with vacuum cost delay than without.  I'd have
 expected the other way around.
 
 I'd really like to see a test case for this...

My feeling is that a micro-architecture test would be more likely to
reveal some interesting information.

Best Regards, Simon Riggs


---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] Quad Opteron stuck in the mud

2005-07-13 Thread Vivek Khera


On Jul 13, 2005, at 2:54 PM, Dan Harris wrote:


4 x 2.2GHz Opterons
12 GB of RAM
4x10k 73GB Ultra320 SCSI drives in RAID 0+1
1GB hardware cache memory on the RAID controller



if it is taking that long to update about 25% of your table, then you  
must be I/O bound. check I/o while you're running a big query.


also, what RAID controller are you running?  be sure you have the  
latest BIOS and drivers for it.


on a pair of dual opterons, I can do large operations on tables with  
100 million rows much faster than you seem to be able.  I have  
MegaRAID 320-2x controllers with 15kRPM drives.


Vivek Khera, Ph.D.
+1-301-869-4449 x806




smime.p7s
Description: S/MIME cryptographic signature


Re: [PERFORM] cost-based vacuum

2005-07-13 Thread Ian Westmacott
I can at least report that the problem does not seem to
occur with Postgres 8.0.1 running on a dual Opteron.

--Ian


On Wed, 2005-07-13 at 16:39, Simon Riggs wrote:
 On Wed, 2005-07-13 at 14:58 -0400, Tom Lane wrote:
  Ian Westmacott [EMAIL PROTECTED] writes:
   On Wed, 2005-07-13 at 11:55, Simon Riggs wrote:
   On Tue, 2005-07-12 at 13:50 -0400, Ian Westmacott wrote:
   It appears not to matter whether it is one of the tables
   being written to that is ANALYZEd.  I can ANALYZE an old,
   quiescent table, or a system table and see this effect.
   
   Can you confirm that this effect is still seen even when the ANALYZE
   doesn't touch *any* of the tables being accessed?
  
   Yes.
  
  This really isn't making any sense at all. 
 
 Agreed. I think all of this indicates that some wierdness (technical
 term) is happening at a different level in the computing stack. I think
 all of this points fairly strongly to it *not* being a PostgreSQL
 algorithm problem, i.e. if the code was executed by an idealised Knuth-
 like CPU then we would not get this problem. Plus, I have faith that if
 it was a problem in that plane then you or another would have
 uncovered it by now.
 
  However, these certainly do not explain Ian's problem, because (a) these
  only apply to VACUUM, not ANALYZE; (b) they would only lock the table
  being VACUUMed, not other ones; (c) if these locks were to block the
  reader or writer thread, it'd manifest as blocking on a semaphore, not
  as a surge in LWLock thrashing.
 
 I've seen enough circumstantial evidence to connect the time spent
 inside LWLockAcquire/Release as being connected to the Semaphore ops
 within them, not the other aspects of the code.
 
 Months ago we discussed the problem of false sharing on closely packed
 arrays of shared variables because of the large cache line size of the
 Xeon MP. When last we touched on that thought, I focused on the thought
 that the LWLock array was too tightly packed for the predefined locks.
 What we didn't discuss (because I was too focused on the other array)
 was the PGPROC shared array is equally tightly packed, which could give
 problems on the semaphores in LWLock.
 
 Intel says fairly clearly that this would be an issue. 
 
   Is that Xeon MP then?
  
   Yes.
  
  The LWLock activity is certainly suggestive of prior reports of
  excessive buffer manager lock contention, but it makes *no* sense that
  that would be higher with vacuum cost delay than without.  I'd have
  expected the other way around.
  
  I'd really like to see a test case for this...
 
 My feeling is that a micro-architecture test would be more likely to
 reveal some interesting information.
 
 Best Regards, Simon Riggs
 
 
 ---(end of broadcast)---
 TIP 6: explain analyze is your friend


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] Quad Opteron stuck in the mud

2005-07-13 Thread Simon Riggs
On Wed, 2005-07-13 at 12:54 -0600, Dan Harris wrote:
 For example, as I'm writing this, I am running an UPDATE statement  
 that will affect a small part of the table, and is querying on an  
 indexed boolean field.

An indexed boolean field?

Hopefully, ftindex is false for very few rows of the table?

Try changing the ftindex to be a partial index, so only index the false
values. Or don't index it at all.

Split the table up into smaller pieces.

Don't use an UPDATE statement. Keep a second table, and insert records
into it when you would have updated previously. If a row is not found,
you know that it has ftindex=true. That way, you'll never have row
versions building up in the main table, which you'll still get even if
you VACUUM.

Best Regards, Simon Riggs




---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


[PERFORM] Slow Query

2005-07-13 Thread Marc McIntyre

Hi,

I'm having a problem with a query that performs a sequential scan on a 
table when it should be performing an index scan. The interesting thing 
is, when we dumped the database on another server, it performed an index 
scan on that server. The systems are running the same versions of 
postgres (7.4.8) and the problem persists after running an ANALYZE 
VERBOSE and after a REINDEX TABLE sq_ast FORCE. The only difference 
that i can see is that the postgresql.conf files differ slightly, and 
the hardware is different. Note that the system performing the 
sequential scan is a Dual 2.8GHz Xeon, 4GB Ram, 300GB HDD. And the 
system performing an index scan is not as powerful.


A copy of the postgresql.conf for the system performing the index scan 
can be found at http://beta.squiz.net/~mmcintyre/postgresql_squiz_uk.conf
A copy of the postgresql.conf for the system performing the sequential 
scan can be found at http://beta.squiz.net/~mmcintyre/postgresql_future.conf


The Query:

SELECT a.assetid, a.short_name, a.type_code, a.status, l.linkid, 
l.link_type, l.sort_order, lt.num_kids, u.url, ap.path,

CASE u.http
   WHEN '1' THEN 'http'
   WHEN '0' THEN 'https'
END AS protocol
FROM ((sq_ast a LEFT JOIN sq_ast_url u ON a.assetid = u.assetid) LEFT 
JOIN sq_ast_path ap ON a.assetid = ap.assetid),sq_ast_lnk l, 
sq_ast_lnk_tree lt WHERE a.assetid = l.minorid AND

 l.linkid = lt.linkid AND l.majorid = '2' AND
 l.link_type = 2 ORDER BY sort_order


The EXPLAIN ANALYZE from the system performing an sequential scan:

QUERY PLAN
Sort  (cost=30079.79..30079.89 rows=42 width=113) (actual 
time=39889.989..39890.346 rows=260 loops=1)
 Sort Key: l.sort_order
 -  Nested Loop  (cost=25638.02..30078.65 rows=42 width=113) (actual 
time=9056.336..39888.557 rows=260 loops=1)
   -  Merge Join  (cost=25638.02..29736.01 rows=25 width=109) (actual 
time=9056.246..39389.359 rows=260 loops=1)
 Merge Cond: ((outer.assetid)::text = inner.?column5?)
 -  Merge Left Join  (cost=25410.50..29132.82 rows=150816 
width=97) (actual time=8378.176..38742.111 rows=150567 loops=1)
   Merge Cond: ((outer.assetid)::text = 
(inner.assetid)::text)
   -  Merge Left Join  (cost=25410.50..26165.14 rows=150816 
width=83) (actual time=8378.130..9656.413 rows=150489 loops=1)
 Merge Cond: (outer.?column5? = inner.?column4?)
 -  Sort  (cost=25408.17..25785.21 rows=150816 
width=48) (actual time=8377.733..8609.218 rows=150486 loops=1)
   Sort Key: (a.assetid)::text
   -  Seq Scan on sq_ast a  (cost=0.00..12436.16 
rows=150816 width=48) (actual time=0.011..5578.231 rows=151378 loops=1)
 -  Sort  (cost=2.33..2.43 rows=37 width=43) (actual 
time=0.364..0.428 rows=37 loops=1)
   Sort Key: (u.assetid)::text
   -  Seq Scan on sq_ast_url u  (cost=0.00..1.37 
rows=37 width=43) (actual time=0.023..0.161 rows=37 loops=1)
   -  Index Scan using sq_ast_path_ast on sq_ast_path ap  
(cost=0.00..2016.98 rows=45893 width=23) (actual time=0.024..14041.571 rows=45812 
loops=1)
 -  Sort  (cost=227.52..227.58 rows=25 width=21) (actual 
time=131.838..132.314 rows=260 loops=1)
   Sort Key: (l.minorid)::text
   -  Index Scan using sq_ast_lnk_majorid on sq_ast_lnk l  
(cost=0.00..226.94 rows=25 width=21) (actual time=0.169..126.201 rows=260 loops=1)
 Index Cond: ((majorid)::text = '2'::text)
 Filter: (link_type = 2)
   -  Index Scan using sq_ast_lnk_tree_linkid on sq_ast_lnk_tree lt  
(cost=0.00..13.66 rows=3 width=8) (actual time=1.539..1.900 rows=1 loops=260)
 Index Cond: (outer.linkid = lt.linkid)
Total runtime: 39930.395 ms


The EXPLAIN ANALYZE from the system performing an index scan scan:


Sort  (cost=16873.64..16873.74 rows=40 width=113) (actual 
time=2169.905..2169.912 rows=13 loops=1)
  Sort Key: l.sort_order
  -  Nested Loop  (cost=251.39..16872.58 rows=40 width=113) (actual 
time=45.724..2169.780 rows=13 loops=1)
-  Merge Join  (cost=251.39..16506.42 rows=32 width=109) (actual 
time=45.561..2169.012 rows=13 loops=1)
  Merge Cond: ((outer.assetid)::text = inner.?column5?)
  -  Merge Left Join  (cost=2.33..15881.92 rows=149982 width=97) 
(actual time=0.530..1948.718 rows=138569 loops=1)
Merge Cond: ((outer.assetid)::text = 
(inner.assetid)::text)
-  Merge Left Join  (cost=2.33..13056.04 rows=149982 
width=83) (actual time=0.406..953.781 rows=138491 loops=1)
  Merge Cond: ((outer.assetid)::text = 
inner.?column4?)
  -  Index Scan using sq_ast_pkey on sq_ast a  
(cost=0.00..14952.78 rows=149982 width=48) (actual time=0.154..388.872 rows=138488 

Re: [PERFORM] performance problems ... 100 cpu utilization

2005-07-13 Thread Qingqing Zhou

Dennis [EMAIL PROTECTED] writes

 checking the status of connections at this point ( ps -eaf | grep
 postgres:)  where the CPU is maxed out I saw this:

 127 idle
 12 bind
 38 parse
 34 select


Are you sure 100% CPU usage is solely contributed by Postgresql? Also, from
the ps status you list, I can hardly see that's a problem because of problem
you mentioned below.


 I know there has been discussion about problems on Xeon MP systems. Is
 this what we are running into? Or is something else going on? Is there
 other information I can provide that might help determine what is going
on?


Here is a talk about Xeon-SMP spinlock contention problem:
http://archives.postgresql.org/pgsql-performance/2005-05/msg00441.php


Regards,
Qingqing



---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] performance problems ... 100 cpu utilization

2005-07-13 Thread Dennis

Qingqing Zhou wrote:


Are you sure 100% CPU usage is solely contributed by Postgresql? Also, from
the ps status you list, I can hardly see that's a problem because of problem
you mentioned below.
 

The postgreSQL processes are what is taking up all the cpu. There aren't 
any other major applications on the machine. Its a dedicated database 
server, only for this application.


It doesn't seem to make sense that PostgreSQL would be maxed out at this 
point. I think given the size of the box, it could do quite a bit 
better. So, what is going on? I don't know.


Dennis

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [PERFORM] performance problems ... 100 cpu utilization

2005-07-13 Thread David Mitchell
What is the load average on this machine? Do you do many updates? If you 
do a lot of updates, perhaps you haven't vacuumed recently. We were 
seeing similar symptoms when we started load testing our stuff and it 
turned out we were vacuuming too infrequently.


David

Dennis wrote:

Qingqing Zhou wrote:

Are you sure 100% CPU usage is solely contributed by Postgresql? Also, 
from
the ps status you list, I can hardly see that's a problem because of 
problem

you mentioned below.
 

The postgreSQL processes are what is taking up all the cpu. There aren't 
any other major applications on the machine. Its a dedicated database 
server, only for this application.


It doesn't seem to make sense that PostgreSQL would be maxed out at this 
point. I think given the size of the box, it could do quite a bit 
better. So, what is going on? I don't know.


Dennis

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq





--
David Mitchell
Software Engineer
Telogis

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


[PERFORM] lots of updates on small table

2005-07-13 Thread Alison Winters
Hi,

Our application requires a number of processes to select and update rows
from a very small (10 rows) Postgres table on a regular and frequent
basis.  These processes often run for weeks at a time, but over the
space of a few days we find that updates start getting painfully slow.
We are running a full vacuum/analyze and reindex on the table every day,
but the updates keep getting slower and slower until the processes are
restarted.  Restarting the processes isn't really a viable option in our
24/7 production environment, so we're trying to figure out what's
causing the slow updates.

The environment is as follows:

Red Hat 9, kernel 2.4.20-8
PostgreSQL 7.3.2
ecpg 2.10.0

The processes are all compiled C programs accessing the database using
ECPG.

Does anyone have any thoughts on what might be happening here?

Thanks
Alison


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings