Re: [PERFORM] Terrible Write Performance of a Stored Procedure

2009-06-29 Thread Scott Mead
>
> You're right that it should be removed, but this explanation is wrong. The
> behavior as configured is actually "if there are >=100 other transactions in
> progress, wait 0.1 second before committing after the first one gets
> committed", in hopes that one of the other 100 might also join along in the
> disk write.


  Thanks for the correction.  My question is how you're getting .1 seconds
from his commit_delay?

if (CommitDelay > 0 && enableFsync &&
CountActiveBackends() >= CommitSiblings)
 pg_usleep(CommitDelay);

  Wouldn't this actually be 1 second based on a commit_delay of 10?



>
>
> Since in this case max_connections it set to 100, it's actually impossible
> for the commit_delay/commit_siblings behavior to trigger give this
> configuration.  That's one reason it should be removed.  The other is that i
> general, if you don't exactly what you're doing, you shouldn't be touching
> either parameters; they don't do what people expect them to and it's
> extremely unlikely you'll encounter any of the rare use cases where they
> might help.


   After looking, I agree, thanks again for the correction Greg.

--Scott


[PERFORM] random slow query

2009-06-29 Thread Sean Ma
I have a table about 50 million rows. There are a few writers to pump
data into the table at the rate of 4 row/hours. Most the time, the
SELECT is less than 100 ms. However sometime it is very slow, from 30
seconds to 500 seconds. The database is vacuum analyze regularly.

One months ago, this type of slow query happened about a few time per
day. But recently, the slow query happens more frequent at the rate of
once every 10 minutes or less. There seesm not relation to the
database loading or the type of query. If I manually execute these
query, it is returns in less than 1 seconds.

I just wonder where should I start to look?

Thanks

Shawn.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] random slow query

2009-06-29 Thread Andres Freund

On 06/29/2009 03:33 PM, Sean Ma wrote:

I have a table about 50 million rows. There are a few writers to pump
data into the table at the rate of 4 row/hours. Most the time, the
SELECT is less than 100 ms. However sometime it is very slow, from 30
seconds to 500 seconds. The database is vacuum analyze regularly.

One months ago, this type of slow query happened about a few time per
day. But recently, the slow query happens more frequent at the rate of
once every 10 minutes or less. There seesm not relation to the
database loading or the type of query. If I manually execute these
query, it is returns in less than 1 seconds.

I just wonder where should I start to look?
The slow queries could be waiting for locks - so you could enable 
log_lock_waits to see if that is the issue.


Andres


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Utilizing multiple cores in a function call.

2009-06-29 Thread Hartman, Matthew
Good morning.

 

I have developed a function call that schedules patient appointments
within a day based on several resource constraints. The algorithm has
been mentioned on here before and I have managed to tweak it down to 6-9
seconds from the original 27 seconds. 

 

Of course, I want it to be faster still. The function throttles one of
my CPUs to 100% (shown as 50% in Task Manager) and leaves the other one
sitting pretty. Is there any way to use both CPUs?

 

Thanks,


Matthew Hartman
Programmer/Analyst
Information Management, ICP
Kingston General Hospital
(613) 549- x4294 

 

 



Re: [PERFORM] Utilizing multiple cores in a function call.

2009-06-29 Thread Jean-David Beyer

Hartman, Matthew wrote:

Good morning.

 

I have developed a function call that schedules patient appointments 
within a day based on several resource constraints. The algorithm has 
been mentioned on here before and I have managed to tweak it down to 6-9 
seconds from the original 27 seconds.



To speed up the execution of processes, I heartily recommend the book,
"Writing Efficient Programs" by Jon Louis Bentley, Prentice-Hall, 1982.

There are many important steps. The most important is usually to refine the
algorithm itself. I once speeded up a program that would have required
several weeks on a main frame running 24/7 to 6 minutes by improving the
basic algorithm of the thing. Only then would it have made sense to optimize
the actual code.

Next, you need to profile the code to see where the hot spots are. There is
little point to examining code in other parts of the program.


Of course, I want it to be faster still. The function throttles one of 
my CPUs to 100% (shown as 50% in Task Manager) and leaves the other one 
sitting pretty. Is there any way to use both CPUs?



You could write your algorithm as a separate process -- a server.
Then in you SQL program, you invoke a trivial function that just hands the
arguments off to the server. Thus, your SQL program would normally run on
one processor and the time-consuming algorithm would run on the other.

If you are not careful, this would not benefit you at all because your SQL
process would wait until the server returns its answer. So you would need to
modify your SQL program so that it could do other things while the server
process did its thing.

My guess is that you need a more efficient algorithm before you go to the
trouble of optimizing the execution of your current one. As far as making it
run on multiple processors, it depends critically on the nature of your
algorithm. A few can easily be modified to run on multiple processors. Some
cannot run on multiple processors at all.
 


Thanks,


Matthew Hartman
Programmer/Analyst
Information Management, ICP
Kingston General Hospital
(613) 549- x4294

 

 




--
  .~.  Jean-David Beyer  Registered Linux User 85642.
  /V\  PGP-Key: 9A2FC99A Registered Machine   241939.
 /( )\ Shrewsbury, New Jerseyhttp://counter.li.org
 ^^-^^ 10:40:01 up 10 days, 21:29, 3 users, load average: 4.19, 4.22, 4.19

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] random slow query

2009-06-29 Thread Kevin Grittner
Sean Ma  wrote: 
> I have a table about 50 million rows. There are a few writers to
> pump data into the table at the rate of 4 row/hours. Most the
> time, the SELECT is less than 100 ms. However sometime it is very
> slow, from 30 seconds to 500 seconds. The database is vacuum analyze
> regularly.
 
What version of PostgreSQL is this?  On what OS?  What hardware?
 
We had similar problems on some of our servers under 8.2 and earlier
due to the tendency of PostgreSQL to build up a very large set of
dirty pages and then throw them all at the drives with an immediate
fsync.  The RAID controller queued up the requests, and fast reads got
stuck in the queue behind all those writes.  You may want to look at
this excellent coverage of the topic by Greg Smith:
 
http://www.westnet.com/~gsmith/content/postgresql/chkp-bgw-83.htm
 
We got around the problem by keeping the checkpoint interval and
shared buffer size fairly small, and making the background writer
fairly aggressive.  What works for you, if this is your problem, may
be different.  I've heard that some have had to tune their OS caching
configuration.
 
-Kevin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Utilizing multiple cores in a function call.

2009-06-29 Thread Hartman, Matthew
I'm pretty much at that point where I've chewed the fat off of the
algorithm, or at least at my personal limits. Occasionally a new idea
pops into my head and yields an improvement but it's in the order of
100-250ms.

Google came back with "no sir". It seems PostgreSQL is limited to one
CPU per query unless I spawn a master/controller like you suggested.
Shame..


Matthew Hartman
Programmer/Analyst
Information Management, ICP
Kingston General Hospital
(613) 549- x4294 
 

-Original Message-
From: pgsql-performance-ow...@postgresql.org
[mailto:pgsql-performance-ow...@postgresql.org] On Behalf Of Jean-David
Beyer
Sent: Monday, June 29, 2009 10:53 AM
To: pgsql performance
Subject: Re: [PERFORM] Utilizing multiple cores in a function call.

Hartman, Matthew wrote:
> Good morning.
> 
>  
> 
> I have developed a function call that schedules patient appointments 
> within a day based on several resource constraints. The algorithm has 
> been mentioned on here before and I have managed to tweak it down to
6-9 
> seconds from the original 27 seconds.
> 
To speed up the execution of processes, I heartily recommend the book,
"Writing Efficient Programs" by Jon Louis Bentley, Prentice-Hall, 1982.

There are many important steps. The most important is usually to refine
the
algorithm itself. I once speeded up a program that would have required
several weeks on a main frame running 24/7 to 6 minutes by improving the
basic algorithm of the thing. Only then would it have made sense to
optimize
the actual code.

Next, you need to profile the code to see where the hot spots are. There
is
little point to examining code in other parts of the program.
> 
> Of course, I want it to be faster still. The function throttles one of

> my CPUs to 100% (shown as 50% in Task Manager) and leaves the other
one 
> sitting pretty. Is there any way to use both CPUs?
> 
You could write your algorithm as a separate process -- a server.
Then in you SQL program, you invoke a trivial function that just hands
the
arguments off to the server. Thus, your SQL program would normally run
on
one processor and the time-consuming algorithm would run on the other.

If you are not careful, this would not benefit you at all because your
SQL
process would wait until the server returns its answer. So you would
need to
modify your SQL program so that it could do other things while the
server
process did its thing.

My guess is that you need a more efficient algorithm before you go to
the
trouble of optimizing the execution of your current one. As far as
making it
run on multiple processors, it depends critically on the nature of your
algorithm. A few can easily be modified to run on multiple processors.
Some
cannot run on multiple processors at all.
>  
> 
> Thanks,
> 
> 
> Matthew Hartman
> Programmer/Analyst
> Information Management, ICP
> Kingston General Hospital
> (613) 549- x4294
> 
>  
> 
>  
> 


-- 
   .~.  Jean-David Beyer  Registered Linux User 85642.
   /V\  PGP-Key: 9A2FC99A Registered Machine   241939.
  /( )\ Shrewsbury, New Jerseyhttp://counter.li.org
  ^^-^^ 10:40:01 up 10 days, 21:29, 3 users, load average: 4.19, 4.22,
4.19

-- 
Sent via pgsql-performance mailing list
(pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Utilizing multiple cores in a function call.

2009-06-29 Thread Joe Conway

Hartman, Matthew wrote:

I'm pretty much at that point where I've chewed the fat off of the
algorithm, or at least at my personal limits. Occasionally a new idea
pops into my head and yields an improvement but it's in the order of
100-250ms.

Google came back with "no sir". It seems PostgreSQL is limited to one
CPU per query unless I spawn a master/controller like you suggested.
Shame..


Although I have never done it myself, you might try using PL/R to 
perform the algo in R, and make use of snow package to run parallel 
tasks -- see:

  http://cran.r-project.org/web/views/HighPerformanceComputing.html

Joe


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Utilizing multiple cores in a function call.

2009-06-29 Thread Greg Smith

On Mon, 29 Jun 2009, Hartman, Matthew wrote:

The function throttles one of my CPUs to 100% (shown as 50% in Task 
Manager) and leaves the other one sitting pretty. Is there any way to 
use both CPUs?


Not easily.  Potential techniques:

-Rewrite the function or its time critical portion in some other language 
that allows using two processes usefully


-Write a "worker server" that you prompt to pick up work from a table and 
write its output to another that you can ask to handle part of the job. 
You might communicate with the worker using the LISTEN/NOTIFY mechanism in 
the database.


-Some combination of these two techniques.  One popular way to speed up 
things that are running slowly is to run some part of them in a C UDF, so 
that you could use "select my_big_computation(x,y,z)" and get faster 
execution.


If you were hoping for a quick answer, no such thing.  I suspect you'd get 
better help talking about what your function does and see if there's a 
specific part somebody else is familiar with optimizing.


For example, I've seen >10:1 speedups just be rewriting one small portion 
of a computationally expensive mathematical function in C before, keeping 
the rest of the logic on the database side.  You don't necessarily have to 
rewrite the whole thing.


--
* Greg Smith gsm...@gregsmith.com http://www.gregsmith.com Baltimore, MD

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Utilizing multiple cores in a function call.

2009-06-29 Thread Merlin Moncure
On Mon, Jun 29, 2009 at 10:26 AM, Hartman,
Matthew wrote:
> Good morning.
>
>
>
> I have developed a function call that schedules patient appointments within
> a day based on several resource constraints. The algorithm has been
> mentioned on here before and I have managed to tweak it down to 6-9 seconds
> from the original 27 seconds.
>
>
>
> Of course, I want it to be faster still. The function throttles one of my
> CPUs to 100% (shown as 50% in Task Manager) and leaves the other one sitting
> pretty. Is there any way to use both CPUs?

Your best bet at using multiple cores on a cpu bound problem is to try
and divide up the work logically into separate pools and to attack the
work with multiple function calls.  This is probably what the database
would do for you if it had 'in-query multi threading', only the
database could attack it on a much finer grained level.

In your particular case, I think the answer is to attack the problem
in an entirely new direction, although your matrix query is one of the
coolest queries i've seen in a while.

The first thought that jumped out at me was to try and treat your
nurses and stations as incrementing numbers so that if you allocate
three hours of nurse x's time, you increment some number by three in
the nurse's table.  This would lay on top of a kind of a time
calculation system that would convert that number to actual time based
on the nurses schedule, etc.  On top of _that_, you would need some
kind of resolution system to handle canceled appointments, nurse
no-shows, etc.

The stations would operate on a similar principle...you imagine all
the available hours for the station stretched to infinity on a number
line and keep a fixed allocation point which always moves forwards,
plus a 'number line time' -> real time converter and a freestore list
to pick up unexpectedly freed time.

merlin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] random slow query

2009-06-29 Thread Mike Ivanov

Hi Sean,

Sean Ma wrote:

One months ago, this type of slow query happened about a few time per
day. But recently, the slow query happens more frequent at the rate of
once every 10 minutes or less. There seesm not relation to th


What is your hardware (memory, CPU type and such)?

This seems like a cache issue to me, but I can't tell for sure without 
some additional information on your system:


1) What is the amount of a) available memory b) free memory and c) 
memory available to i/o buffers?


2) What is the swap usage if any?

3) What is the CPU load? Any noticeable patterns in CPU load?

You can use /usr/bin/top to obtain most of this information.

Mike



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Utilizing multiple cores in a function call.

2009-06-29 Thread Craig Ringer
On Mon, 2009-06-29 at 14:42 -0400, Greg Smith wrote:

> -Write a "worker server" that you prompt to pick up work from a table and 
> write its output to another that you can ask to handle part of the job. 
> You might communicate with the worker using the LISTEN/NOTIFY mechanism in 
> the database.
> 
> -Some combination of these two techniques.  One popular way to speed up 
> things that are running slowly is to run some part of them in a C UDF, so 
> that you could use "select my_big_computation(x,y,z)" and get faster 
> execution.

The trouble here is that the backend may not like having threads
suddenly introduced into its execution environment.

If properly written, I don't really see why a C UDF that used pthreads
couldn't spawn two worker threads that _NEVER_ touched _ANY_ PostgreSQL
APIs, talked to the SPI, etc, and let them run while blocking the main
thread until they complete.

Then again, I know relatively little about Pg's guts, and for all I know
initing the pthread environment could completely mess up the backend.


Personally I'd want to do it out-of-process, using a SECURITY DEFINER
PL/PgSQL function owned by a role that also owned some otherwise private
queue and result tables for your worker server. As Greg Smith noted,
LISTEN/NOTIFY would allow your worker server to avoid polling and
instead sleep when there's nothing in the queue, and would also let your
waiting clients avoid polling the result table.

> For example, I've seen >10:1 speedups just be rewriting one small portion 
> of a computationally expensive mathematical function in C before, keeping 
> the rest of the logic on the database side.  You don't necessarily have to 
> rewrite the whole thing.

A useful dirty trick is to use Psyco in Python. It's a specializing
compiler that can get massive performance boosts out of Python code
without any code changes, and it seems to work with PL/Python. Just:

try:
  import psyco
  psyco.full()
except:
  # Enabing Pysco failed; don't care
  pass

in your function should get you a pretty serious boost. This will NOT,
however, allow your code to use two cores at once; you'll need threading
or multiple processes for that.

-- 
Craig Ringer


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance