Re: [PERFORM] performance problems ... 100 cpu utilization

2005-07-13 Thread Dennis

David Mitchell wrote:

What is the load average on this machine? Do you do many updates? If 
you do a lot of updates, perhaps you haven't vacuumed recently. We 
were seeing similar symptoms when we started load testing our stuff 
and it turned out we were vacuuming too infrequently.


The load average at the 100% utilization point was about 30! A vacuum 
analyze was done before the test was started. I believe there are many 
more selects than updates happening at any one time.


Dennis

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [PERFORM] Quad Opteron stuck in the mud

2005-07-13 Thread Dan Harris


On Jul 14, 2005, at 12:12 AM, Greg Stark wrote:


Dan Harris <[EMAIL PROTECTED]> writes:



I keep the entire database vacuumed regularly.



How often is "regularly"?
Well, once every day, but there aren't a ton of inserts or updates  
going on a daily basis.  Maybe 1,000 total inserts?


Also, if you've done occasional massive batch updates like you  
describe here
you may need a VACUUM FULL or alternatively a CLUSTER command to  
compact the
table -- vacuum identifies the free space but if you've doubled the  
size of
your table with a large update that's a lot more free space than  
you want

hanging around waiting to be used.

I have a feeling I'm going to need to do a cluster soon.  I have done  
several mass deletes and reloads on it.




For example, as I'm writing this, I am running an UPDATE  
statement  that will
affect a small part of the table, and is querying on an  indexed  
boolean field.



...

update eventactivity set ftindex = false where ftindex = true;   
( added the

where clause because I don't want to alter where ftindex  is null )



It's definitely worthwhile doing an "EXPLAIN UPDATE..." to see if  
this even

used the index. It sounds like it did a sequential scan.



I tried that, and indeed it was using an index, although after  
reading Simon's post, I realize that was kind of dumb to have an  
index on a bool. I have since removed it.


Sequential scans during updates are especially painful. If there  
isn't free
space lying around in the page where the updated record lies then  
another page
has to be used or a new page added. If you're doing a massive  
update you can
exhaust the free space available making the update have to go back  
and forth
between the page being read and the end of the table where pages  
are being

written.


This is great info, thanks.





#

vmstat output ( as I am waiting for this to finish ):
procs ---memory-- ---swap-- -io --system--
cpu
r  b   swpd   freebuff   cache   si   sobibo   in 
cs  us sy id wa
0  1   5436 2823908  26140 918370401  2211   540  694
336   9  2 76 13




[I assume you ran "vmstat 10" or some other interval and then  
waited for at

least the second line? The first line outputted from vmstat is mostly
meaningless]


Yeah, this was at least 10 or so down the list ( the last one before  
ctrl-c )




Um. That's a pretty meager i/o rate. Just over 2MB/s. The cpu is  
76% idle
which sounds fine but that could be one processor pegged at 100%  
while the
others are idle. If this query is the only one running on the  
system then it

would behave just like that.
Well, none of my processors had ever reached 100% until I changed to  
ext2 today ( read below for more info )


Is it possible you have some foreign keys referencing these records  
that
you're updating? In which case every record being updated might be  
causing a
full table scan on another table (or multiple other tables). If  
those tables
are entirely in cache then it could cause these high cpu low i/o  
symptoms.




No foreign keys or triggers.


Ok, so I remounted this drive as ext2 shortly before sending my first  
email today.  It wasn't enough time for me to notice the ABSOLUTELY  
HUGE difference in performance change.  Ext3 must really be crappy  
for postgres, or at least is on this box.  Now that it's ext2, this  
thing is flying like never before.   My CPU utilization has  
skyrocketed, telling me that the disk IO was constraining it immensely.


I always knew that it might be a little faster, but the box feels  
like it can "breathe" again and things that used to be IO intensive  
and run for an hour or more are now running in < 5 minutes.  I'm a  
little worried about not having a journalized file system, but that  
performance difference will keep me from switching back ( at least to  
ext3! ).  Maybe someday I will try XFS.


I would be surprised if everyone who ran ext3 had this kind of  
problem, maybe it's specific to my kernel, raid controller, I don't  
know.  But, this is amazing.  It's like I have a new server.


Thanks to everyone for their valuable input and a big thanks to all  
the dedicated pg developers on here who make this possible!


-Dan


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] Profiler for PostgreSQL

2005-07-13 Thread Bruce Momjian
Agha Asif Raza wrote:
> Is there any MS-SQL Server like 'Profiler' available for PostgreSQL? A 
> profiler is a tool that monitors the database server and outputs a detailed 
> trace of all the transactions/queries that are executed on a database during 
> a specified period of time. Kindly let me know if any of you knows of such a 
> tool for PostgreSQL.
>  Agha Asif Raza

Sure see log_statement in postgresql.conf.  There are a lot of settings
in there to control what is logged.

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  pgman@candle.pha.pa.us   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


[PERFORM] JFS fastest filesystem for PostgreSQL?

2005-07-13 Thread Jeffrey W. Baker
[reposted due to delivery error -jwb]

I just took delivery of a new system, and used the opportunity to
benchmark postgresql 8.0 performance on various filesystems.  The system
in question runs Linux 2.6.12, has one CPU and 1GB of system memory, and
5 7200RPM SATA disks attached to an Areca hardware RAID controller
having 128MB of cache.  The caches are all write-back.

I ran pgbench with a scale factor of 1000 and a total of 100,000
transactions per run.  I varied the number of clients between 10 and
100.  It appears from my test JFS is much faster than both ext3 and XFS
for this workload.  JFS and XFS were made with the mkfs defaults.  ext3
was made with -T largefile4 and -E stride=32.  The deadline scheduler
was used for all runs (anticipatory scheduler is much worse).

Here's the result, in transactions per second.

  ext3  jfs  xfs
-
 10 Clients 55   81   68
100 Clients 61  100   64


-jwb

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PERFORM] Profiler for PostgreSQL

2005-07-13 Thread Christopher Kings-Lynne

Try turning on query logging and using the 'pqa' utility on pgfoundry.org.

Chris

Agha Asif Raza wrote:
Is there any MS-SQL Server like 'Profiler' available for PostgreSQL? A 
profiler is a tool that monitors the database server and outputs a 
detailed trace of all the transactions/queries that are executed on a 
database during a specified period of time. Kindly let me know if any of 
you knows of such a tool for PostgreSQL.
 
Agha Asif Raza



---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [PERFORM] Quad Opteron stuck in the mud

2005-07-13 Thread Greg Stark
Dan Harris <[EMAIL PROTECTED]> writes:

> I keep the entire database vacuumed regularly.

How often is "regularly"? We get frequent posts from people who think daily or
every 4 hours is often enough. If the table is very busy you can need vacuums
as often as every 15 minutes. 

Also, if you've done occasional massive batch updates like you describe here
you may need a VACUUM FULL or alternatively a CLUSTER command to compact the
table -- vacuum identifies the free space but if you've doubled the size of
your table with a large update that's a lot more free space than you want
hanging around waiting to be used.

> For example, as I'm writing this, I am running an UPDATE statement  that will
> affect a small part of the table, and is querying on an  indexed boolean 
> field.
...
> update eventactivity set ftindex = false where ftindex = true;  ( added the
> where clause because I don't want to alter where ftindex  is null )

It's definitely worthwhile doing an "EXPLAIN UPDATE..." to see if this even
used the index. It sounds like it did a sequential scan.

Sequential scans during updates are especially painful. If there isn't free
space lying around in the page where the updated record lies then another page
has to be used or a new page added. If you're doing a massive update you can
exhaust the free space available making the update have to go back and forth
between the page being read and the end of the table where pages are being
written.

> #
> 
> vmstat output ( as I am waiting for this to finish ):
> procs ---memory-- ---swap-- -io --system--
> cpu
> r  b   swpd   freebuff   cache   si   sobibo   incs  us sy id 
> wa
> 0  1   5436 2823908  26140 918370401  2211   540  694   336   9  2 76 
> 13

[I assume you ran "vmstat 10" or some other interval and then waited for at
least the second line? The first line outputted from vmstat is mostly
meaningless]

Um. That's a pretty meager i/o rate. Just over 2MB/s. The cpu is 76% idle
which sounds fine but that could be one processor pegged at 100% while the
others are idle. If this query is the only one running on the system then it
would behave just like that.

Is it possible you have some foreign keys referencing these records that
you're updating? In which case every record being updated might be causing a
full table scan on another table (or multiple other tables). If those tables
are entirely in cache then it could cause these high cpu low i/o symptoms.

Or are there any triggers on this table?


-- 
greg


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


[PERFORM] Profiler for PostgreSQL

2005-07-13 Thread Agha Asif Raza
Is there any MS-SQL Server like 'Profiler' available for PostgreSQL? A profiler is a tool that monitors the database server and outputs a detailed trace of all the transactions/queries that are executed on a database during a specified period of time. Kindly let me know if any of you knows of such a tool for PostgreSQL.

 
Agha Asif Raza


[PERFORM] lots of updates on small table

2005-07-13 Thread Alison Winters
Hi,

Our application requires a number of processes to select and update rows
from a very small (<10 rows) Postgres table on a regular and frequent
basis.  These processes often run for weeks at a time, but over the
space of a few days we find that updates start getting painfully slow.
We are running a full vacuum/analyze and reindex on the table every day,
but the updates keep getting slower and slower until the processes are
restarted.  Restarting the processes isn't really a viable option in our
24/7 production environment, so we're trying to figure out what's
causing the slow updates.

The environment is as follows:

Red Hat 9, kernel 2.4.20-8
PostgreSQL 7.3.2
ecpg 2.10.0

The processes are all compiled C programs accessing the database using
ECPG.

Does anyone have any thoughts on what might be happening here?

Thanks
Alison


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] performance problems ... 100 cpu utilization

2005-07-13 Thread David Mitchell
What is the load average on this machine? Do you do many updates? If you 
do a lot of updates, perhaps you haven't vacuumed recently. We were 
seeing similar symptoms when we started load testing our stuff and it 
turned out we were vacuuming too infrequently.


David

Dennis wrote:

Qingqing Zhou wrote:

Are you sure 100% CPU usage is solely contributed by Postgresql? Also, 
from
the ps status you list, I can hardly see that's a problem because of 
problem

you mentioned below.
 

The postgreSQL processes are what is taking up all the cpu. There aren't 
any other major applications on the machine. Its a dedicated database 
server, only for this application.


It doesn't seem to make sense that PostgreSQL would be maxed out at this 
point. I think given the size of the box, it could do quite a bit 
better. So, what is going on? I don't know.


Dennis

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq





--
David Mitchell
Software Engineer
Telogis

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] performance problems ... 100 cpu utilization

2005-07-13 Thread Dennis

Qingqing Zhou wrote:


Are you sure 100% CPU usage is solely contributed by Postgresql? Also, from
the ps status you list, I can hardly see that's a problem because of problem
you mentioned below.
 

The postgreSQL processes are what is taking up all the cpu. There aren't 
any other major applications on the machine. Its a dedicated database 
server, only for this application.


It doesn't seem to make sense that PostgreSQL would be maxed out at this 
point. I think given the size of the box, it could do quite a bit 
better. So, what is going on? I don't know.


Dennis

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [PERFORM] performance problems ... 100 cpu utilization

2005-07-13 Thread Qingqing Zhou

"Dennis" <[EMAIL PROTECTED]> writes
>
> checking the status of connections at this point ( ps -eaf | grep
> "postgres:")  where the CPU is maxed out I saw this:
>
> 127 idle
> 12 bind
> 38 parse
> 34 select
>

Are you sure 100% CPU usage is solely contributed by Postgresql? Also, from
the ps status you list, I can hardly see that's a problem because of problem
you mentioned below.

>
> I know there has been discussion about problems on Xeon MP systems. Is
> this what we are running into? Or is something else going on? Is there
> other information I can provide that might help determine what is going
on?
>

Here is a talk about Xeon-SMP spinlock contention problem:
http://archives.postgresql.org/pgsql-performance/2005-05/msg00441.php


Regards,
Qingqing



---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


[PERFORM] Slow Query

2005-07-13 Thread Marc McIntyre

Hi,

I'm having a problem with a query that performs a sequential scan on a 
table when it should be performing an index scan. The interesting thing 
is, when we dumped the database on another server, it performed an index 
scan on that server. The systems are running the same versions of 
postgres (7.4.8) and the problem persists after running an "ANALYZE 
VERBOSE" and after a "REINDEX TABLE sq_ast FORCE". The only difference 
that i can see is that the postgresql.conf files differ slightly, and 
the hardware is different. Note that the system performing the 
sequential scan is a Dual 2.8GHz Xeon, 4GB Ram, 300GB HDD. And the 
system performing an index scan is not as powerful.


A copy of the postgresql.conf for the system performing the index scan 
can be found at http://beta.squiz.net/~mmcintyre/postgresql_squiz_uk.conf
A copy of the postgresql.conf for the system performing the sequential 
scan can be found at http://beta.squiz.net/~mmcintyre/postgresql_future.conf


The Query:

SELECT a.assetid, a.short_name, a.type_code, a.status, l.linkid, 
l.link_type, l.sort_order, lt.num_kids, u.url, ap.path,

CASE u.http
   WHEN '1' THEN 'http'
   WHEN '0' THEN 'https'
END AS protocol
FROM ((sq_ast a LEFT JOIN sq_ast_url u ON a.assetid = u.assetid) LEFT 
JOIN sq_ast_path ap ON a.assetid = ap.assetid),sq_ast_lnk l, 
sq_ast_lnk_tree lt WHERE a.assetid = l.minorid AND

 l.linkid = lt.linkid AND l.majorid = '2' AND
 l.link_type <= 2 ORDER BY sort_order


The EXPLAIN ANALYZE from the system performing an sequential scan:

QUERY PLAN
Sort  (cost=30079.79..30079.89 rows=42 width=113) (actual 
time=39889.989..39890.346 rows=260 loops=1)
 Sort Key: l.sort_order
 ->  Nested Loop  (cost=25638.02..30078.65 rows=42 width=113) (actual 
time=9056.336..39888.557 rows=260 loops=1)
   ->  Merge Join  (cost=25638.02..29736.01 rows=25 width=109) (actual 
time=9056.246..39389.359 rows=260 loops=1)
 Merge Cond: (("outer".assetid)::text = "inner"."?column5?")
 ->  Merge Left Join  (cost=25410.50..29132.82 rows=150816 
width=97) (actual time=8378.176..38742.111 rows=150567 loops=1)
   Merge Cond: (("outer".assetid)::text = 
("inner".assetid)::text)
   ->  Merge Left Join  (cost=25410.50..26165.14 rows=150816 
width=83) (actual time=8378.130..9656.413 rows=150489 loops=1)
 Merge Cond: ("outer"."?column5?" = "inner"."?column4?")
 ->  Sort  (cost=25408.17..25785.21 rows=150816 
width=48) (actual time=8377.733..8609.218 rows=150486 loops=1)
   Sort Key: (a.assetid)::text
   ->  Seq Scan on sq_ast a  (cost=0.00..12436.16 
rows=150816 width=48) (actual time=0.011..5578.231 rows=151378 loops=1)
 ->  Sort  (cost=2.33..2.43 rows=37 width=43) (actual 
time=0.364..0.428 rows=37 loops=1)
   Sort Key: (u.assetid)::text
   ->  Seq Scan on sq_ast_url u  (cost=0.00..1.37 
rows=37 width=43) (actual time=0.023..0.161 rows=37 loops=1)
   ->  Index Scan using sq_ast_path_ast on sq_ast_path ap  
(cost=0.00..2016.98 rows=45893 width=23) (actual time=0.024..14041.571 rows=45812 
loops=1)
 ->  Sort  (cost=227.52..227.58 rows=25 width=21) (actual 
time=131.838..132.314 rows=260 loops=1)
   Sort Key: (l.minorid)::text
   ->  Index Scan using sq_ast_lnk_majorid on sq_ast_lnk l  
(cost=0.00..226.94 rows=25 width=21) (actual time=0.169..126.201 rows=260 loops=1)
 Index Cond: ((majorid)::text = '2'::text)
 Filter: (link_type <= 2)
   ->  Index Scan using sq_ast_lnk_tree_linkid on sq_ast_lnk_tree lt  
(cost=0.00..13.66 rows=3 width=8) (actual time=1.539..1.900 rows=1 loops=260)
 Index Cond: ("outer".linkid = lt.linkid)
Total runtime: 39930.395 ms


The EXPLAIN ANALYZE from the system performing an index scan scan:


Sort  (cost=16873.64..16873.74 rows=40 width=113) (actual 
time=2169.905..2169.912 rows=13 loops=1)
  Sort Key: l.sort_order
  ->  Nested Loop  (cost=251.39..16872.58 rows=40 width=113) (actual 
time=45.724..2169.780 rows=13 loops=1)
->  Merge Join  (cost=251.39..16506.42 rows=32 width=109) (actual 
time=45.561..2169.012 rows=13 loops=1)
  Merge Cond: (("outer".assetid)::text = "inner"."?column5?")
  ->  Merge Left Join  (cost=2.33..15881.92 rows=149982 width=97) 
(actual time=0.530..1948.718 rows=138569 loops=1)
Merge Cond: (("outer".assetid)::text = 
("inner".assetid)::text)
->  Merge Left Join  (cost=2.33..13056.04 rows=149982 
width=83) (actual time=0.406..953.781 rows=138491 loops=1)
  Merge Cond: (("outer".assetid)::text = 
"inner"."?column4?")
  ->  Index Scan using sq_ast_pkey on sq_ast a  
(cost=0.00..14952.78 rows=149

Re: [PERFORM] Quad Opteron stuck in the mud

2005-07-13 Thread Simon Riggs
On Wed, 2005-07-13 at 12:54 -0600, Dan Harris wrote:
> For example, as I'm writing this, I am running an UPDATE statement  
> that will affect a small part of the table, and is querying on an  
> indexed boolean field.

An indexed boolean field?

Hopefully, ftindex is false for very few rows of the table?

Try changing the ftindex to be a partial index, so only index the false
values. Or don't index it at all.

Split the table up into smaller pieces.

Don't use an UPDATE statement. Keep a second table, and insert records
into it when you would have updated previously. If a row is not found,
you know that it has ftindex=true. That way, you'll never have row
versions building up in the main table, which you'll still get even if
you VACUUM.

Best Regards, Simon Riggs




---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM] cost-based vacuum

2005-07-13 Thread Ian Westmacott
I can at least report that the problem does not seem to
occur with Postgres 8.0.1 running on a dual Opteron.

--Ian


On Wed, 2005-07-13 at 16:39, Simon Riggs wrote:
> On Wed, 2005-07-13 at 14:58 -0400, Tom Lane wrote:
> > Ian Westmacott <[EMAIL PROTECTED]> writes:
> > > On Wed, 2005-07-13 at 11:55, Simon Riggs wrote:
> > >> On Tue, 2005-07-12 at 13:50 -0400, Ian Westmacott wrote:
> > >>> It appears not to matter whether it is one of the tables
> > >>> being written to that is ANALYZEd.  I can ANALYZE an old,
> > >>> quiescent table, or a system table and see this effect.
> > >> 
> > >> Can you confirm that this effect is still seen even when the ANALYZE
> > >> doesn't touch *any* of the tables being accessed?
> > 
> > > Yes.
> > 
> > This really isn't making any sense at all. 
> 
> Agreed. I think all of this indicates that some wierdness (technical
> term) is happening at a different level in the computing stack. I think
> all of this points fairly strongly to it *not* being a PostgreSQL
> algorithm problem, i.e. if the code was executed by an idealised Knuth-
> like CPU then we would not get this problem. Plus, I have faith that if
> it was a problem in that "plane" then you or another would have
> uncovered it by now.
> 
> > However, these certainly do not explain Ian's problem, because (a) these
> > only apply to VACUUM, not ANALYZE; (b) they would only lock the table
> > being VACUUMed, not other ones; (c) if these locks were to block the
> > reader or writer thread, it'd manifest as blocking on a semaphore, not
> > as a surge in LWLock thrashing.
> 
> I've seen enough circumstantial evidence to connect the time spent
> inside LWLockAcquire/Release as being connected to the Semaphore ops
> within them, not the other aspects of the code.
> 
> Months ago we discussed the problem of false sharing on closely packed
> arrays of shared variables because of the large cache line size of the
> Xeon MP. When last we touched on that thought, I focused on the thought
> that the LWLock array was too tightly packed for the predefined locks.
> What we didn't discuss (because I was too focused on the other array)
> was the PGPROC shared array is equally tightly packed, which could give
> problems on the semaphores in LWLock.
> 
> Intel says fairly clearly that this would be an issue. 
> 
> > >> Is that Xeon MP then?
> > 
> > > Yes.
> > 
> > The LWLock activity is certainly suggestive of prior reports of
> > excessive buffer manager lock contention, but it makes *no* sense that
> > that would be higher with vacuum cost delay than without.  I'd have
> > expected the other way around.
> > 
> > I'd really like to see a test case for this...
> 
> My feeling is that a "micro-architecture" test would be more likely to
> reveal some interesting information.
> 
> Best Regards, Simon Riggs
> 
> 
> ---(end of broadcast)---
> TIP 6: explain analyze is your friend


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] Quad Opteron stuck in the mud

2005-07-13 Thread Vivek Khera


On Jul 13, 2005, at 2:54 PM, Dan Harris wrote:


4 x 2.2GHz Opterons
12 GB of RAM
4x10k 73GB Ultra320 SCSI drives in RAID 0+1
1GB hardware cache memory on the RAID controller



if it is taking that long to update about 25% of your table, then you  
must be I/O bound. check I/o while you're running a big query.


also, what RAID controller are you running?  be sure you have the  
latest BIOS and drivers for it.


on a pair of dual opterons, I can do large operations on tables with  
100 million rows much faster than you seem to be able.  I have  
MegaRAID 320-2x controllers with 15kRPM drives.


Vivek Khera, Ph.D.
+1-301-869-4449 x806




smime.p7s
Description: S/MIME cryptographic signature


Re: [PERFORM] cost-based vacuum

2005-07-13 Thread Simon Riggs
On Wed, 2005-07-13 at 14:58 -0400, Tom Lane wrote:
> Ian Westmacott <[EMAIL PROTECTED]> writes:
> > On Wed, 2005-07-13 at 11:55, Simon Riggs wrote:
> >> On Tue, 2005-07-12 at 13:50 -0400, Ian Westmacott wrote:
> >>> It appears not to matter whether it is one of the tables
> >>> being written to that is ANALYZEd.  I can ANALYZE an old,
> >>> quiescent table, or a system table and see this effect.
> >> 
> >> Can you confirm that this effect is still seen even when the ANALYZE
> >> doesn't touch *any* of the tables being accessed?
> 
> > Yes.
> 
> This really isn't making any sense at all. 

Agreed. I think all of this indicates that some wierdness (technical
term) is happening at a different level in the computing stack. I think
all of this points fairly strongly to it *not* being a PostgreSQL
algorithm problem, i.e. if the code was executed by an idealised Knuth-
like CPU then we would not get this problem. Plus, I have faith that if
it was a problem in that "plane" then you or another would have
uncovered it by now.

> However, these certainly do not explain Ian's problem, because (a) these
> only apply to VACUUM, not ANALYZE; (b) they would only lock the table
> being VACUUMed, not other ones; (c) if these locks were to block the
> reader or writer thread, it'd manifest as blocking on a semaphore, not
> as a surge in LWLock thrashing.

I've seen enough circumstantial evidence to connect the time spent
inside LWLockAcquire/Release as being connected to the Semaphore ops
within them, not the other aspects of the code.

Months ago we discussed the problem of false sharing on closely packed
arrays of shared variables because of the large cache line size of the
Xeon MP. When last we touched on that thought, I focused on the thought
that the LWLock array was too tightly packed for the predefined locks.
What we didn't discuss (because I was too focused on the other array)
was the PGPROC shared array is equally tightly packed, which could give
problems on the semaphores in LWLock.

Intel says fairly clearly that this would be an issue. 

> >> Is that Xeon MP then?
> 
> > Yes.
> 
> The LWLock activity is certainly suggestive of prior reports of
> excessive buffer manager lock contention, but it makes *no* sense that
> that would be higher with vacuum cost delay than without.  I'd have
> expected the other way around.
> 
> I'd really like to see a test case for this...

My feeling is that a "micro-architecture" test would be more likely to
reveal some interesting information.

Best Regards, Simon Riggs


---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] Quad Opteron stuck in the mud

2005-07-13 Thread Dan Harris


On Jul 13, 2005, at 2:17 PM, Stephen Frost wrote:


Could you come up w/ a test case that others could reproduce where
explain isn't returning?


This was simply due to my n00bness :)  I had always been doing  
explain analyze, instead of just explain.  Next time one of these  
queries comes up, I will be sure to do the explain without analyze.


FYI that update query I mentioned in the initial thread just finished  
after updating 8.3 million rows.


-Dan


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [PERFORM] Quad Opteron stuck in the mud

2005-07-13 Thread Alvaro Herrera
On Wed, Jul 13, 2005 at 01:16:25PM -0600, Dan Harris wrote:

> On Jul 13, 2005, at 1:11 PM, John A Meinel wrote:
> 
> >I might be wrong, but there may be something much more substantially
> >wrong than slow i/o.
> 
> Yes, I'm afraid of that too.  I just don't know what tools I should  
> use to figure that out.  I have some 20 other databases on this  
> system, same schema but varying sizes, and the small ones perform  
> very well.  It feels like there is an O(n) increase in wait time that  
> has recently become very noticeable on the largest of them.

I'd guess it's stuck on some lock.  Try that EXPLAIN, and when it
blocks, watch the pg_locks view for locks not granted to the process
executing the EXPLAIN.  Then check what else is holding the locks.

-- 
Alvaro Herrera ()
"La rebeldía es la virtud original del hombre" (Arthur Schopenhauer)

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM] Quad Opteron stuck in the mud

2005-07-13 Thread Stephen Frost
* Dan Harris ([EMAIL PROTECTED]) wrote:
> On Jul 13, 2005, at 1:11 PM, John A Meinel wrote:
> >I might be wrong, but there may be something much more substantially
> >wrong than slow i/o.
> 
> Yes, I'm afraid of that too.  I just don't know what tools I should  
> use to figure that out.  I have some 20 other databases on this  
> system, same schema but varying sizes, and the small ones perform  
> very well.  It feels like there is an O(n) increase in wait time that  
> has recently become very noticeable on the largest of them.

Could you come up w/ a test case that others could reproduce where
explain isn't returning?  I think that would be very useful towards
solving at least that issue...

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [PERFORM] Quad Opteron stuck in the mud

2005-07-13 Thread Dan Harris


On Jul 13, 2005, at 1:11 PM, John A Meinel wrote:



I might be wrong, but there may be something much more substantially
wrong than slow i/o.
John



Yes, I'm afraid of that too.  I just don't know what tools I should  
use to figure that out.  I have some 20 other databases on this  
system, same schema but varying sizes, and the small ones perform  
very well.  It feels like there is an O(n) increase in wait time that  
has recently become very noticeable on the largest of them.


-Dan

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [PERFORM] Quad Opteron stuck in the mud

2005-07-13 Thread John A Meinel

Dan Harris wrote:

Gurus,



> even the  explain never

finishes when I try that.


Just a short bit. If "EXPLAIN SELECT" doesn't return, there seems to be
a very serious problem. Because I think EXPLAIN doesn't actually run the
query, just has the query planner run. And the query planner shouldn't
ever get heavily stuck.

I might be wrong, but there may be something much more substantially
wrong than slow i/o.
John
=:->


signature.asc
Description: OpenPGP digital signature


[PERFORM] performance problems ... 100 cpu utilization

2005-07-13 Thread Dennis

Hi,

I've got a java based web application that uses PostgreSQL 8.0.2. 
PostgreSQL runs on its own machine with RHEL 3, ia32e kernel, dual Xeon 
processor, 4 Gb ram.


The web application runs on a seperate machine from the database. The 
application machine has three tomcat instances configured to use 64 
database connections each using DBCP for pooling. Most of the data 
access is via Hibernate. The database itself is about 100 meg in size.


We're perf testing the application with Loadrunner. At about 500 virtual 
users hitting the web application, the cpu utilization on the database 
server is at 100%, PostgreSQL is on its knees. The memory usage isn't 
bad, the I/O isn't bad, only the CPU seems to be maxed out.


checking the status of connections at this point ( ps -eaf | grep 
"postgres:")  where the CPU is maxed out I saw this:


127 idle
12 bind
38 parse
34 select

Hibernate is used in the application and unfortunately this seems to 
cause queries not to get logged. (see 
http://archives.postgresql.org/pgsql-admin/2005-05/msg00241.php)


I know there has been discussion about problems on Xeon MP systems. Is 
this what we are running into? Or is something else going on? Is there 
other information I can provide that might help determine what is going on?


Here are the postgresql.conf settings:

# The maximum number of connections.
max_connections = 256

# Standard performance-related settings.
shared_buffers = 16384
max_fsm_pages = 20
max_fsm_relations = 1
fsync = false
wal_sync_method = fsync
wal_buffers = 32
checkpoint_segments = 6
effective_cache_size = 38400
random_page_cost = 2
work_mem = 16384
maintenance_work_mem = 16384

# TODO - need to investigate these.
commit_delay = 0
commit_siblings = 5
max_locks_per_transaction = 512


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [PERFORM] Quad Opteron stuck in the mud

2005-07-13 Thread Dan Harris

So sorry, I forgot to mention I'm running version 8.0.1

Thanks


---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] cost-based vacuum

2005-07-13 Thread Tom Lane
Ian Westmacott <[EMAIL PROTECTED]> writes:
> On Wed, 2005-07-13 at 11:55, Simon Riggs wrote:
>> On Tue, 2005-07-12 at 13:50 -0400, Ian Westmacott wrote:
>>> It appears not to matter whether it is one of the tables
>>> being written to that is ANALYZEd.  I can ANALYZE an old,
>>> quiescent table, or a system table and see this effect.
>> 
>> Can you confirm that this effect is still seen even when the ANALYZE
>> doesn't touch *any* of the tables being accessed?

> Yes.

This really isn't making any sense at all.  I took another look through
the vacuum_delay_point() calls, and I can see a couple that are
questionably placed:

* the one in count_nondeletable_pages() is done while we are holding
exclusive lock on the table; we might be better off not to delay there,
so as not to block non-VACUUM processes longer than we have to.

* the ones in hashbulkdelete and rtbulkdelete are done while holding
various forms of exclusive locks on the index (this was formerly true
of gistbulkdelete as well).  Again it might be better not to delay.

However, these certainly do not explain Ian's problem, because (a) these
only apply to VACUUM, not ANALYZE; (b) they would only lock the table
being VACUUMed, not other ones; (c) if these locks were to block the
reader or writer thread, it'd manifest as blocking on a semaphore, not
as a surge in LWLock thrashing.

>> Is that Xeon MP then?

> Yes.

The LWLock activity is certainly suggestive of prior reports of
excessive buffer manager lock contention, but it makes *no* sense that
that would be higher with vacuum cost delay than without.  I'd have
expected the other way around.

I'd really like to see a test case for this...

regards, tom lane

---(end of broadcast)---
TIP 6: explain analyze is your friend


[PERFORM] Quad Opteron stuck in the mud

2005-07-13 Thread Dan Harris

Gurus,

A table in one of my databases has just crossed the 30 million row  
mark and has begun to feel very sluggish for just about anything I do  
with it.  I keep the entire database vacuumed regularly.  And, as  
long as I'm not doing a sequential scan, things seem reasonably quick  
most of the time.  I'm now thinking that my problem is IO because  
anything that involves heavy ( like a seq scan ) IO seems to slow to  
a crawl.  Even if I am using indexed fields to grab a few thousand  
rows, then going to sequential scans it gets very very slow.


I have also had the occurrence where queries will not finish for days  
( I eventually have to kill them ).  I was hoping to provide an  
explain analyze for them, but if they never finish... even the  
explain never finishes when I try that.


For example, as I'm writing this, I am running an UPDATE statement  
that will affect a small part of the table, and is querying on an  
indexed boolean field.


I have been waiting for over an hour and a half as I write this and  
it still hasn't finished.  I'm thinking "I bet Tom, Simon or Josh  
wouldn't put up with this kind of wait time..", so I thought I would  
see if anyone here had some pointers.  Maybe I have a really stupid  
setting in my conf file that is causing this.  I really can't believe  
I am at the limits of this hardware, however.



The query:
update eventactivity set ftindex = false where ftindex = true;  
( added the where clause because I don't want to alter where ftindex  
is null )




The table:
  Column|Type | Modifiers
-+-+---
entrydate   | timestamp without time zone |
incidentid  | character varying(40)   |
statustype  | character varying(20)   |
unitid  | character varying(20)   |
recordtext  | character varying(255)  |
recordtext2 | character varying(255)  |
insertdate  | timestamp without time zone |
ftindex | boolean |
Indexes: eventactivity1 btree (incidentid),
 eventactivity_entrydate_idx btree (entrydate),
 eventactivity_ftindex_idx btree (ftindex),
 eventactivity_oid_idx btree (oid)




The hardware:

4 x 2.2GHz Opterons
12 GB of RAM
4x10k 73GB Ultra320 SCSI drives in RAID 0+1
1GB hardware cache memory on the RAID controller

The OS:
Fedora, kernel 2.6.6-1.435.2.3smp ( redhat stock kernel )
filesystem is mounted as ext2

#

vmstat output ( as I am waiting for this to finish ):
procs ---memory-- ---swap-- -io --system--  
cpu
r  b   swpd   free   buff  cache   si   sobibo   incs us  
sy id wa
0  1   5436 2823908  26140 918370401  2211   540  694   336   
9  2 76 13


#

iostat output ( as I am waiting for this to finish ):
avg-cpu:  %user   %nice%sys %iowait   %idle
   9.190.002.19   13.08   75.53

Device:tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
cciss/c0d0  329.26 17686.03  4317.57  161788630   39496378


#
This is a dedicated postgresql server, so maybe some of these  
settings are more liberal than they should be?


relevant ( I hope ) postgresql.conf options are:

shared_buffers = 5
effective_cache_size = 1348000
random_page_cost = 3
work_mem = 512000
max_fsm_pages = 8
log_min_duration_statement = 6
fsync = true ( not sure if I'm daring enough to run without this )
wal_buffers = 1000
checkpoint_segments = 64
checkpoint_timeout = 3000


# FOR PG_AUTOVACUUM --#
stats_command_string = true
stats_row_level = true

Thanks in advance,
Dan








---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] cost-based vacuum

2005-07-13 Thread Ian Westmacott
On Wed, 2005-07-13 at 11:55, Simon Riggs wrote:
> On Tue, 2005-07-12 at 13:50 -0400, Ian Westmacott wrote:
> > It appears not to matter whether it is one of the tables
> > being written to that is ANALYZEd.  I can ANALYZE an old,
> > quiescent table, or a system table and see this effect.
> 
> Can you confirm that this effect is still seen even when the ANALYZE
> doesn't touch *any* of the tables being accessed?

Yes.

> > - this is a dual Xeon. 
> 
> Is that Xeon MP then?

Yes.

> > - Looking at oprofile reports for 10-minute runs of a
> >   database-wide VACUUM with vacuum_cost_delay=0 and 1000,
> >   shows the latter spending a lot of time in LWLockAcquire
> >   and LWLockRelease (20% each vs. 2%).
> 
> Is this associated with high context switching also?

Yes, it appears that context switches increase up to 4-5x
during cost-based ANALYZE.

--Ian



---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] General DB Tuning

2005-07-13 Thread Simon Riggs
On Wed, 2005-07-13 at 09:52 +0800, Christopher Kings-Lynne wrote:
> > Is there a different kind of 'prepared' statements
> > that we should be using in the driver to get logging
> > to work properly?  What is the 'new' protocol?
> 
> The 8.0.2 jdbc driver uses real prepared statements instead of faked 
> ones.  The problem is the new protocol (that the 8.0.2 driver users) has 
> a bug where protocol-prepared queries don't get logged properly.
> 
> I don't know if it's been fixed...

Yes, there is a fix for this in 8.1

Brent has been sent the details.

Best Regards, Simon Riggs


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PERFORM] size of cache

2005-07-13 Thread Simon Riggs
On Wed, 2005-07-13 at 10:20 +0200, Jean-Max Reymond wrote:
> with my application, it seems that size of cache has great effect:
> from 512 Kb of L2 cache to 1Mb boost performance with a factor 3 and
> 20% again from 1Mb L2 cache to 2Mb L2 cache.

Memory request time is the main bottleneck in well tuned database
systems, so your results could be reasonable. 

> I don't understand why a 512Kb cache L2 is too small to fit the data's
> does it exist a tool to trace processor activity and confirm that
> processor is waiting for memory ?

You have both data and instruction cache on the CPU. It is likely it is
the instruction cache that is too small to fit all of the code required
for your application's workload mix.

Use Intel VTune or similar to show the results you seek. 

Best Regards, Simon Riggs


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM] cost-based vacuum

2005-07-13 Thread Simon Riggs
On Tue, 2005-07-12 at 13:50 -0400, Ian Westmacott wrote:
> It appears not to matter whether it is one of the tables
> being written to that is ANALYZEd.  I can ANALYZE an old,
> quiescent table, or a system table and see this effect.

Can you confirm that this effect is still seen even when the ANALYZE
doesn't touch *any* of the tables being accessed?

> - this is a dual Xeon. 

Is that Xeon MP then?

> - Looking at oprofile reports for 10-minute runs of a
>   database-wide VACUUM with vacuum_cost_delay=0 and 1000,
>   shows the latter spending a lot of time in LWLockAcquire
>   and LWLockRelease (20% each vs. 2%).

Is this associated with high context switching also?

Best Regards, Simon Riggs


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


[PERFORM] (pas de sujet)

2005-07-13 Thread Nicolas Beaume

Nicolas,

These sizes would not be considered large. I would leave them
as single tables.

Ken




ok, i though it was large but i must confess i'm relatively new in the 
database word. thank you for the answer.


Just another question : what is the maximal number of rows that can be 
contain in a cursor ?


Nicolas, having a lot of things to learn

--

-
« soyez ce que vous voudriez avoir l'air d'être » Lewis Caroll


---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] General DB Tuning

2005-07-13 Thread Mark Rae
On Wed, Jul 13, 2005 at 09:52:20AM +0800, Christopher Kings-Lynne wrote:
> The 8.0.2 jdbc driver uses real prepared statements instead of faked 
> ones.  The problem is the new protocol (that the 8.0.2 driver users) has 
> a bug where protocol-prepared queries don't get logged properly.
> I don't know if it's been fixed...

It's not in 8.0.3, but I was having the same problems with DBD::Pg so
I backported some of it and also changed the code so that it listed the
values of the bind parameters, so you get something like

LOG:  statement: SELECT sr.name,sr.seq_region_id, sr.length, 1 FROM seq_region 
sr  WHERE sr.name = $1 AND sr.coord_system_id = $2
LOG:  binding: "dbdpg_2" with 2 parameters
LOG:  bind "dbdpg_2" $1 = "20"
LOG:  bind "dbdpg_2" $2 = "1"
LOG:  statement: EXECUTE   [PREPARE:  SELECT sr.name,sr.seq_region_id, 
sr.length, 1 FROM seq_region sr  WHERE sr.name = $1 AND sr.coord_system_id = $2]
LOG:  duration: 0.164 ms

I've attached a patch in case anyone finds it useful.

-Mark
*** postgresql-8.0.3/src/backend/tcop/postgres.c2005-07-13 
09:42:04.997669193 +0100
--- postgresql-8.0.3/src/backend/tcop/postgres.c2005-07-13 
09:34:24.618195580 +0100
***
*** 1370,1375 
--- 1370,1378 
else
portal = CreatePortal(portal_name, false, false);
  
+   if (log_statement == LOGSTMT_ALL)
+ ereport(LOG, (errmsg("binding: \"%s\" with %d parameters", stmt_name, 
numParams)));
+ 
/*
 * Fetch parameters, if any, and store in the portal's memory context.
 *
***
*** 1428,1433 
--- 1431,1439 
 * grotty but is a big win when dealing 
with very
 * large parameter strings.
 */
+   if (log_statement == LOGSTMT_ALL)
+ ereport(LOG, (errmsg("bind \"%s\" $%d 
= \"%s\"", stmt_name, i+1, pvalue)));
+ 
pbuf.data = (char *) pvalue;
pbuf.maxlen = plength + 1;
pbuf.len = plength;
***
*** 1578,1583 
--- 1584,1593 
boolis_trans_exit = false;
boolcompleted;
charcompletionTag[COMPLETION_TAG_BUFSIZE];
+   struct timeval start_t, stop_t;
+   boolsave_log_duration = log_duration;
+   int save_log_min_duration_statement = 
log_min_duration_statement;
+   boolsave_log_statement_stats = log_statement_stats;
  
/* Adjust destination to tell printtup.c what to do */
dest = whereToSendOutput;
***
*** 1614,1619 
--- 1624,1647 
  
set_ps_display(portal->commandTag);
  
+   /*
+* We use save_log_* so "SET log_duration = true"  and "SET
+* log_min_duration_statement = true" don't report incorrect time
+* because gettimeofday() wasn't called. Similarly,
+* log_statement_stats has to be captured once.
+*/
+   if (save_log_duration || save_log_min_duration_statement != -1)
+   gettimeofday(&start_t, NULL);
+ 
+   if (save_log_statement_stats)
+   ResetUsage();
+ 
+   if (log_statement == LOGSTMT_ALL)
+   /* We have the portal, so output the source query. */
+   ereport(LOG,
+   (errmsg("statement: EXECUTE %s  [PREPARE:  
%s]", portal_name,
+   portal->sourceText ? 
portal->sourceText : "")));
+ 
BeginCommand(portal->commandTag, dest);
  
/* Check for transaction-control commands */
***
*** 1708,1713 
--- 1736,1785 
pq_putemptymessage('s');
}
  
+   /*
+* Combine processing here as we need to calculate the query duration
+* in both instances.
+*/
+   if (save_log_duration || save_log_min_duration_statement != -1)
+   {
+   longusecs;
+ 
+   gettimeofday(&stop_t, NULL);
+   if (stop_t.tv_usec < start_t.tv_usec)
+   {
+   stop_t.tv_sec--;
+   stop_t.tv_usec += 100;
+   }
+   usecs = (long) (stop_t.tv_sec - start_t.tv_sec) * 100 +
+   (long) (stop_t.tv_usec - start_t.tv_usec);
+ 
+   /* Only print duration if we previously printed the statement. 
*/
+   if (log_statement == LOGSTMT_ALL && save_log_duration)
+   ereport(LOG,
+   (errmsg("duration: %ld.%03ld ms",
+   (long) ((stop_t.tv_sec - 
start_t.tv_sec) * 1000 +
+ (stop_t.tv_usec - 
start_t.tv_usec

[PERFORM] large table vs multiple smal tables

2005-07-13 Thread Nicolas Beaume

Hello

I have a large database with 4 large tables (each containing at least 
200 000 rows, perhaps even 1 or 2 million) and i ask myself if it's 
better to split them into small tables (e.g tables of 2000 rows) to 
speed the access and the update of those tables (considering that i will 
have few update but a lot of reading).


Do you think it would be efficient ?

Nicolas, wondering if he hadn't be too greedy

--

-
« soyez ce que vous voudriez avoir l'air d'être » Lewis Caroll


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


[PERFORM] size of cache

2005-07-13 Thread Jean-Max Reymond
with my application, it seems that size of cache has great effect:
from 512 Kb of L2 cache to 1Mb boost performance with a factor 3 and
20% again from 1Mb L2 cache to 2Mb L2 cache.
I don't understand why a 512Kb cache L2 is too small to fit the data's
does it exist a tool to trace processor activity and confirm that
processor is waiting for memory ?
does it exist a tool to snapshot postgres activity and understand
where we spend time and potentialy avoid the bottleneck ?

thanks for your tips.

-- 
Jean-Max Reymond
CKR Solutions Open Source
Nice France
http://www.ckr-solutions.com

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq