Re: [PERFORM] Proposal of tunable fix for scalability of 8.4

2009-03-21 Thread Jignesh K. Shah



Robert Haas wrote:

On Fri, Mar 20, 2009 at 7:39 PM, Jignesh K. Shah j.k.s...@sun.com wrote:
  

Alvaro Herrera wrote:


So Simon's correct.


And perhaps this explains why Jignesh is measuring an improvement on his
benchmark.  Perhaps an useful experiment would be to turn this behavior
off and compare performance.  This lack of measurement is probably the
cause that the suggested patch to fix it was never applied.

The patch is here
http://archives.postgresql.org//pgsql-hackers/2004-11/msg00935.php
  

One of the reasons why my patch helps is it keeps this check intact but
allows other exclusive Wake up.. Now what PostgreSQL calls Wakes is  in
reality just makes a variable indicating wake up and not really signalling a
process to wake up. This is a key point to note. So when the process wanting
the exclusive fights the OS Scheduling policy to finally get time on the CPU
then it   check the value to see if it is allowed to wake up and potentially



I'm confused.  Is a process waiting for an LWLock is in a runnable
state?  I thought we went to sleep on a semaphore.

...Robert

  

If you check the code
http://doxygen.postgresql.org/lwlock_8c-source.html#l00451

Semaphore lock can wake up but then it needs to confirm !proc-lwWaiting 
which can be TRUE if you have not been Waked up
then it  increase the extraWaits count and go back to PGSemaphoreLock  
.. What my patch gives the flexibility with sequential X wakeups that it 
can still exit and check for getting the exclusive lock and if not add 
back to the queue. My theory is when it is already on CPU running makes 
sense to check for the lock if another exclusive is running since the 
chances are that it has completed within few cycles is very high.. and 
the improvement that I see leads to that inference. Otherwise if 
lwWaiting is TRUE then it does not even check if the lock is available 
or not and just goes back and waits for the next chance..  This is the 
part that gets the benefit of my patch.


-Jignesh


--
Jignesh Shah   http://blogs.sun.com/jkshah  
The New Sun Microsystems,Inc   http://sun.com/postgresql


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Proposal of tunable fix for scalability of 8.4

2009-03-20 Thread Jignesh K. Shah



Alvaro Herrera wrote:

Alvaro Herrera escribió:

  

So Simon's correct.



And perhaps this explains why Jignesh is measuring an improvement on his
benchmark.  Perhaps an useful experiment would be to turn this behavior
off and compare performance.  This lack of measurement is probably the
cause that the suggested patch to fix it was never applied.

The patch is here
http://archives.postgresql.org//pgsql-hackers/2004-11/msg00935.php

  
One of the reasons why my patch helps is it keeps this check intact but 
allows other exclusive Wake up.. Now what PostgreSQL calls Wakes is  
in reality just makes a variable indicating wake up and not really 
signalling a process to wake up. This is a key point to note. So when 
the process wanting the exclusive fights the OS Scheduling policy to 
finally get time on the CPU then it   check the value to see if it is 
allowed to wake up and potentially due the delay between when some other 
process marked that process Waked up and when the process check the 
value Waked up  it is likely that the lock is free (or other exclusive 
process had the lock, did its work and releaed it ).  Over it works well 
since it lives within the logical semantics of the locks but just uses 
various differences in OS scheduling and inherent delays in the system.


It actually makes sense if the process is on CPU wanting exclusive while 
someone else is doing exclusive, let them try getting the lock rather 
than preventing it from trying. The Lock semantic will make sure that 
they don't issue exclusive locks to two process so there is no issue 
with it trying.


It's late in Friday so I wont be able to explain it better but when load 
is heavy, getting on CPU is an achievement, let them try an exclusive 
lock while they are already there.


Try it!!

-Jignesh


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Proposal of tunable fix for scalability of 8.4

2009-03-19 Thread Jignesh K. Shah



Robert Haas wrote:

Actually the patch I submitted shows no overhead from what I have seen and I
think it is useful depending on workloads where it can be turned on  even on
production.



Well, unless I'm misunderstanding something, waking all waiters every
time could lead to arbitrarily long delays for writers on mostly
read-only workloads... and by arbitrarily along, we mean to say
potentially just about forever.  That doesn't sound safe for
production to me.

  


Hi Robert,
The patch I submmitted does not do any manipulation with the list. All 
it changes is gives the flexibility to change how many to wake up at one 
go. 0  is default which wakes up only 1 X (Exclusive)  at a time or all 
sequential S (Shared). Changing the value to 1 will wake up all 
sequential X or all sequential S as they are in the queue (no 
manipulation). Values 2 and higher upto 32 wakes up the next n waiter in 
the queue (X or S) AS they are in the queue. It absolutely does no 
manipulation and hence there is no overhead.  Absolutely safe for 
Production as Scott mentioned there are other things in postgresql.conf 
which can be more dangerous than this tunable.



I dont think anything is majorly wrong in my system.. Sometimes it is
PostgreSQL locks in play and sometimes it can be OS/system related locks in
play (network, IO, file system, etc).  Right now in my patch after I fix
waiting procarray  problem other PostgreSQL locks comes into play:
CLogControlLock, WALInsertLock , etc.  Right now out of the box we have no
means of tweaking something in production if you do land in that problem.
With the patch there is means of doing knob control to tweak the bottlenecks
of Locks for the main workload for which it is put in production.



I'll reiterate my previous objection: I think your approach is too
simplistic.  I think Tom said it the best: a lot of work has gone into
making the locking mechanism lightweight and safe.  I'm pretty
doubtful that you're going to find a change that is still safe, but
performs much better.  The discussions by Heikki, Simon, and others
about changing the way locks are used or inventing new kinds of locks
seem much more promising to me.

  
That is the beauty : The approach is simplistic but very effective. Lot 
of work has gone which is more incremental and this is another one of 
those incremental changes which allows minor tweaks which the workload 
may like very much and perform very well.. Performance tuning game is 
almost like harmonic frequency.  I agree that other kinds of locks seem 
more promising. I had infact proposed one last year  too:

http://archives.postgresql.org//pgsql-hackers/2008-06/msg00291.php

Seriously speaking a change will definitely cannot be done before 8.5 
time frame while this one is simple enough to go for 8.4.
The best thing one can contribute to the thread is to actually try the 
patch on the test system and run your own tests to see how it behaves.


-Jignesh


Right now.. the standard answer applies.. nope you are running the wrong
workload for PostgreSQL, use a connection pooler or your own application
logic. Or maybe.. you have too many users for PostgreSQL use some
proprietary database.



Well I certainly agree that we need to get away from that mentality,
although there's nothing particularly evil about a connection
pooler... it might not be suitable for every workload, but you haven't
specified why one couldn't or shouldn't be used in the situation
you're trying to simulate here.

...Robert
  


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Postgres benchmarking with pgbench

2009-03-19 Thread Jignesh K. Shah



m...@bortal.de wrote:

Hi Greg,

thanks a lot for your hints. I changed my config and changed raid6 to 
raid10, but whatever i do, the benchmark breaks down at a scaling 
factor 75 where the database is only 1126MB big.


Here are my benchmark Results (scaling factor, DB size in MB, TPS) using:
  pgbench -S -c  X  -t 1000 -U pgsql -d benchmark -h MYHOST

1 19 8600
5 79 8743
10 154 8774
20 303 8479
30 453 8775
40 602 8093
50 752 6334
75 1126 3881
150 2247 2297
200 2994 701
250 3742 656
300 4489 596
400 5984 552
500 7479 513

I have no idea if this is any good for a QuardCore Intel(R) Xeon(R) 
CPU  E5320  @ 1.86GHz with 4GB Ram and 6 SATA disk (7200rpm) in raid 10.


Here is my config (maybe with some odd setting): 
http://pastebin.com/m5d7f5717


I played around with:
- max_connections
- shared_buffers
- work_mem
- maintenance_work_mem
- checkpoint_segments
- effective_cache_size

..but whatever i do, the graph looks the same. Any hints or tips what 
my config should look like? Or are these results even okay? Maybe i am 
driving myself crazy for nothing?


Are you running the pgbench client from a different system? Did you 
check if the pgbench client itself is bottlenecked or not. I have seen 
before the client of pgbench is severely limited on the load it can 
drive and process.



-Jignesh


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Proposal of tunable fix for scalability of 8.4

2009-03-18 Thread Jignesh K. Shah



On 03/18/09 08:06, Simon Riggs wrote:

On Wed, 2009-03-18 at 11:45 +, Matthew Wakeling wrote:
  

On Wed, 18 Mar 2009, Simon Riggs wrote:


I agree with that, apart from the granting no more bit.

The most useful behaviour is just to have two modes:
* exclusive-lock held - all other x locks welcome, s locks queue
* shared-lock held - all other s locks welcome, x locks queue
  
The problem with making all other locks welcome is that there is a 
possibility of starvation. Imagine a case where there is a constant stream 
of shared locks - the exclusive locks may never actually get hold of the 
lock under the all other shared locks welcome strategy. 



That's exactly what happens now. 

  

Likewise with the reverse.



I think it depends upon how frequently requests arrive. Commits cause X
locks and we don't commit that often, so its very unlikely that we'd see
a constant stream of X locks and prevent shared lockers.


Some comments from an earlier post on this topic (about 20 months ago):

Since shared locks are currently queued behind exclusive requests
when they cannot be immediately satisfied, it might be worth
reconsidering the way LWLockRelease works also. When we wake up the
queue we only wake the Shared requests that are adjacent to the head of
the queue. Instead we could wake *all* waiting Shared requestors.

e.g. with a lock queue like this:
(HEAD)  S-S-X-S-X-S-X-S
Currently we would wake the 1st and 2nd waiters only. 


If we were to wake the 3rd, 5th and 7th waiters also, then the queue
would reduce in length very quickly, if we assume generally uniform
service times. (If the head of the queue is X, then we wake only that
one process and I'm not proposing we change that). That would mean queue
jumping right? Well thats what already happens in other circumstances,
so there cannot be anything intrinsically wrong with allowing it, the
only question is: would it help? 

  


I thought about that.. Except without putting a restriction a huge queue 
will cause lot of time spent in manipulating the lock list every time. 
One more thing will be to maintain two list shared and exclusive and 
round robin through them for every time you access the list so 
manipulation is low.. But the best thing is to allow flexibility to 
change the algorithm since some workloads may work fine with one and 
others will NOT. The flexibility then allows to tinker for those already 
reaching the limits.


-Jignesh


We need not wake the whole queue, there may be some generally more
beneficial heuristic. The reason for considering this is not to speed up
Shared requests but to reduce the queue length and thus the waiting time
for the Xclusive requestors. Each time a Shared request is dequeued, we
effectively re-enable queue jumping, so a Shared request arriving during
that point will actually jump ahead of Shared requests that were unlucky
enough to arrive while an Exclusive lock was held. Worse than that, the
new incoming Shared requests exacerbate the starvation, so the more
non-adjacent groups of Shared lock requests there are in the queue, the
worse the starvation of the exclusive requestors becomes. We are
effectively randomly starving some shared locks as well as exclusive
locks in the current scheme, based upon the state of the lock when they
make their request. The situation is worst when the lock is heavily
contended and the workload has a 50/50 mix of shared/exclusive requests,
e.g. serializable transactions or transactions with lots of
subtransactions.

  


Re: [PERFORM] Proposal of tunable fix for scalability of 8.4

2009-03-18 Thread Jignesh K. Shah



On 03/18/09 17:16, Scott Carey wrote:

On 3/18/09 4:36 AM, Gregory Stark st...@enterprisedb.com wrote:

  

Jignesh K. Shah j.k.s...@sun.com writes:



In next couple of weeks I plan to test the patch on a different x64 based
system to do a sanity testing on lower number of cores and also try out other
workloads ...
  

I'm actually more interested in the large number of cores but fewer processes
and lower max_connections. If you set max_connections to 64 and eliminate the
wait time you should, in theory, be able to get 100% cpu usage. It would be
very interesting to track down the contention which is preventing that.



My previous calculation in this thread showed that even at 0 wait time, the
client seems to introduce ~3ms wait time overhead on average.  So it takes
close to 128 threads in each test to stop the linear scaling since the
average processing time seems to be about ~3ms.
Either that, or the tests actually are running on a system capable of 128
threads.

  


Nope 64 threads for sure .. Verified it number of times ..

-Jignesh


--
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com
  Ask me about EnterpriseDB's PostGIS support!

-
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance





-
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance
  


Re: [PERFORM] Proposal of tunable fix for scalability of 8.4

2009-03-18 Thread Jignesh K. Shah



On 03/18/09 17:25, Robert Haas wrote:

On Wed, Mar 18, 2009 at 1:43 PM, Scott Carey sc...@richrelevance.com wrote:
  

Its worth ruling out given that even if the likelihood is small, the fix is
easy.  However, I don¹t see the throughput drop from peak as more
concurrency is added that is the hallmark of this problem  usually with a
lot of context switching and a sudden increase in CPU use per transaction.


The problem is that the proposed fix bears a strong resemblence to
attempting to improve your gas mileage by removing a few non-critical
parts from your card, like, say, the bumpers, muffler, turn signals,
windshield wipers, and emergency brake.
  

The fix I was referring to as easy was using a connection pooler -- as a
reply to the previous post. Even if its a low likelihood that the connection
pooler fixes this case, its worth looking at.



Oh, OK.  There seem to be some smart people saying that's a pretty
high-likelihood fix.  I thought you were talking about the proposed
locking change.

  

While it's true that the car
might be drivable in that condition (as long as nothing unexpected
happens), you're going to have a hard time convincing the manufacturer
to offer that as an options package.
  

The original poster's request is for a config parameter, for experimentation
and testing by the brave. My own request was for that version of the lock to
prevent possible starvation but improve performance by unlocking all shared
at once, then doing all exclusives one at a time next, etc.



That doesn't prevent starvation in general, although it will for some workloads.

Anyway, it seems rather pointless to add a config parameter that isn't
at all safe, and adds overhead to a critical part of the system for
people who don't use it.  After all, if you find that it helps, what
are you going to do?  Turn it on in production?  I just don't see how
this is any good other than as a thought-experiment.
  


Actually the patch I submitted shows no overhead from what I have seen 
and I think it is useful depending on workloads where it can be turned 
on  even on production.

At any rate, as I understand it, even after Jignesh eliminated the
waits, he wasn't able to push his CPU utilization above 48%.  Surely
something's not right there.  And he also said that when he added a
knob to control the behavior, he got a performance improvement even
when the knob was set to 0, which corresponds to the behavior we have
already anyway.  So I'm very skeptical that there's something wrong
with either the system or the test.  Until that's understood and
fixed, I don't think that looking at the numbers is worth much.

  


I dont think anything is majorly wrong in my system.. Sometimes it is 
PostgreSQL locks in play and sometimes it can be OS/system related locks 
in play (network, IO, file system, etc).  Right now in my patch after I 
fix waiting procarray  problem other PostgreSQL locks comes into play: 
CLogControlLock, WALInsertLock , etc.  Right now out of the box we have 
no means of tweaking something in production if you do land in that 
problem. With the patch there is means of doing knob control to tweak 
the bottlenecks of Locks for the main workload for which it is put in 
production.


I still haven't seen any downsides with the patch yet other than 
highlighting other bottlenecks in the system. (For example I haven't 
seen a run where the tpm on my workload decreases as you increase the 
number) What I am suggesting is run the patch and see if you find a 
workload where you see a downside in performance and the lock statistics 
output to see if it is pushing the bottleneck elsewhere more likely 
WALInsertLock or CLogControlBlock. If yes then this patch gives you the 
right tweaking opportunity to reduce stress on ProcArrayLock for a 
workload while still not seriously stressing WALInsertLock or 
CLogControlBlock.


Right now.. the standard answer applies.. nope you are running the wrong 
workload for PostgreSQL, use a connection pooler or your own application 
logic. Or maybe.. you have too many users for PostgreSQL use some 
proprietary database.


-Jignesh





I alluded to the three main ways of dealing with lock contention elsewhere.
Avoiding locks, making finer grained locks, and making locks faster.
All are worthy.  Some are harder to do than others.  Some have been heavily
tuned already.  Its a case by case basis.  And regardless, the unfair lock
is a good test tool.



In view of the caveats above, I'll give that a firm maybe.

...Robert
  


Re: [PERFORM] Proposal of tunable fix for scalability of 8.4

2009-03-17 Thread Jignesh K. Shah



Simon Riggs wrote:

On Wed, 2009-03-11 at 22:20 -0400, Jignesh K. Shah wrote:

  

A tunable does not impact existing behavior



Why not put the tunable parameter into the patch and then show the test
results with it in? If there is no overhead, we should then be able to
see that.

  

Can do? Though will need quick primer on adding tunables.
Is it on wiki.postgresql.org anywhere?

-Jignesh

--
Jignesh Shah   http://blogs.sun.com/jkshah  
The New Sun Microsystems,Inc   http://sun.com/postgresql


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Proposal of tunable fix for scalability of 8.4

2009-03-17 Thread Jignesh K. Shah



On 03/16/09 13:39, Simon Riggs wrote:

On Wed, 2009-03-11 at 22:20 -0400, Jignesh K. Shah wrote:


A tunable does not impact existing behavior


Why not put the tunable parameter into the patch and then show the test
results with it in? If there is no overhead, we should then be able to
see that.




I did a patch where I define lock_wakeup_algorithm with default value of 
0, and range is 0 to 32
It basically handles three types of algorithms and 32 different 
permutations, such that:

When lock_wakeup_algorithm is set to
0   = default logic of wakeup (only 1 exclusive or all 
sequential shared)
1   = wake up all sequential exclusives or all sequential 
shared
32= n =2  = wake up first n waiters irrespective of exclusive or 
sequential




I did a quick test with patch. Unfortunately it improves my number even 
with default setting 0 (not sure whether I should be pleased or sad - 
Definitely no overhead infact seems to help performance a bit. NOTE: 
Logic is same, implementation is slightly different for default set)


my Prepatch numbers typically peaked around 136,000 tpm
With the patch and settings:

lock_wakeup_algorithm=0
PEAK: 962: 512: Medium Throughput: 161121.000 Avg Medium Resp: 0.051


When lock_wakeup_algorithm=1
Then my PEAK increases to
PEAK 1560: 832: Medium Throughput: 176577.000 Avg Medium Resp: 0.086
(Couldn't recreate the 184K+ result.. need to check that)

I still havent tested for the rest 2-32 values but you get the point, 
the patch is quite flexible with various types of permutations and no 
overhead.


Do give it a try on your own setup and play with values and compare it 
with your original builds.


Regards,
Jignesh

*** lwlock.cTue Mar 17 12:27:49 2009
--- lwlock.c.orig   Wed Mar 11 12:48:27 2009
***
*** 87,93 
  
  static intlock_addin_request = 0;
  static bool lock_addin_request_allowed = true;
- int LWLockWakeupAlgorithm;
  
  #ifdef LWLOCK_STATS
  static intcounts_for_pid = 0;
--- 87,92 
***
*** 564,570 
PGPROC *head;
PGPROC *proc;
int i;
- int runq;
  
PRINT_LWDEBUG(LWLockRelease, lockid, lock);
  
--- 563,568 
***
*** 612,631 
 * as many waiters as want shared access.
 */
proc = head;
!if (LWLockWakeupAlgorithm || !proc-lwExclusive)
!{
!   if (LWLockWakeupAlgorithm = 1)
!   {
while (proc-lwWaitLink != NULL 
!   (proc-lwExclusive == 
proc-lwWaitLink-lwExclusive))
proc = proc-lwWaitLink;
-   }
-else 
-   {  
- runq= LWLockWakeupAlgorithm;
-   while (proc-lwWaitLink != NULL  --runq)
-   proc = proc-lwWaitLink;
- }
}
/* proc is now the last PGPROC to be released */
lock-head = proc-lwWaitLink;
--- 610,620 
 * as many waiters as want shared access.
 */
proc = head;
!   if (!proc-lwExclusive)
!   {
while (proc-lwWaitLink != NULL 
!  !proc-lwWaitLink-lwExclusive)
proc = proc-lwWaitLink;
}
/* proc is now the last PGPROC to be released */
lock-head = proc-lwWaitLink;
*** lwlock.h.orig   Tue Mar 17 14:27:10 2009
--- lwlock.hTue Mar 17 08:24:40 2009
***
*** 103,106 
--- 103,107 
  
  extern void RequestAddinLWLocks(int n);
  
+ extern int LWLockWakeupAlgorithm; 
  #endif   /* LWLOCK_H */
*** guc.c.orig  Tue Mar 17 07:30:26 2009
--- guc.c   Tue Mar 17 07:47:10 2009
***
*** 57,62 
--- 57,63 
  #include postmaster/walwriter.h
  #include regex/regex.h
  #include storage/bufmgr.h
+ #include storage/lwlock.h
  #include storage/fd.h
  #include tcop/tcopprot.h
  #include tsearch/ts_cache.h
***
*** 167,172 
--- 168,174 
  static bool assign_maxconnections(int newval, bool doit, GucSource source);
  static bool assign_autovacuum_max_workers(int newval, bool doit, GucSource 
source);
  static bool assign_effective_io_concurrency(int newval, bool doit, GucSource 
source);
+ static bool assign_lock_wakeup_algorithm(int newval, bool doit, GucSource 
source);
  static const char *assign_pgstat_temp_directory(const char *newval, bool 
doit, GucSource source);
  
  static char *config_enum_get_options

Re: [PERFORM] Proposal of tunable fix for scalability of 8.4

2009-03-17 Thread Jignesh K. Shah



Simon Riggs wrote:

On Tue, 2009-03-17 at 17:41 -0400, Jignesh K. Shah wrote:

  

I did a quick test with patch. Unfortunately it improves my number
even with default setting 0 (not sure whether I should be pleased or
sad - Definitely no overhead infact seems to help performance a bit.
NOTE: Logic is same, implementation is slightly different for default
set)



OK, I bite. 25% gain from doing nothing??? You're stretching my... err,
credulity.

I like the train of thought for setting 1 and it is worth investigating,
but something feels wrong somewhere.

  
Actually I think I am hurting my credibility here  since I cannot 
explain the improvement with the patch but still using default logic 
(thought different way I compare sequential using fields from the 
previous proc structure  instead of comparing with constant boolean)  
But the change was necessary to allow it to handle multiple algorithms 
and yet be sleek and not bloated.


In next couple of weeks I plan to test the patch on a different x64 
based system to do a sanity testing on lower number of cores and also 
try out other workloads ...


Regards,
Jignesh


-
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Proposal of tunable fix for scalability of 8.4

2009-03-16 Thread Jignesh K. Shah



On 03/16/09 11:08, Gregory Stark wrote:

Jignesh K. Shah j.k.s...@sun.com writes:

  

Generally when there is dead constant.. signs of classic bottleneck ;-)  We
will be fixing one to get to another.. but knocking bottlenecks is the name of
the game I think



Indeed. I think the bottleneck we're interested in addressing here is why you
say you weren't able to saturate the 64 threads with 64 processes when they're
all RAM-resident.

From what I see you still have 400+ processes? Is that right?

  


Any one claiming they run CPU intensive are not always telling the 
truth.. They *Think* they are running CPU intensive for the right part 
but there could be memory misses, they could be doing statistics where 
they are not really stressing the intended stuff to test, they could be 
parsing through the results where they are not stressing the backend 
while still claiming to be cpu intensive (though from a different 
perspective)


So yes a single process specially a client cannot claim to keep the 
backend 100% active but so can neither a connection pooler since it 
still has to some other stuff within the process.


-Jignesh



Re: [PERFORM] Proposal of tunable fix for scalability of 8.4

2009-03-15 Thread Jignesh K. Shah



Simon Riggs wrote:

On Wed, 2009-03-11 at 16:53 -0400, Jignesh K. Shah wrote:

  

1200: 2000: Medium Throughput: -1781969.000 Avg Medium Resp: 0.019



I think you need to iron out bugs in your test script before we put too
much stock into the results generated. Your throughput should not be
negative.

I'd be interested in knowing the number of S and X locks requested, so
we can think about this from first principles. My understanding is that
ratio of S:X is about 10:1. Do you have more exact numbers?

  
Simon, that's a known bug for the test where the first time it reaches 
the max number of users, it throws a negative number. But all other 
numbers are pretty much accurate


Generally the users:transactions count depends on think time..

-Jignesh

--
Jignesh Shah   http://blogs.sun.com/jkshah  
The New Sun Microsystems,Inc   http://sun.com/postgresql


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Proposal of tunable fix for scalability of 8.4

2009-03-15 Thread Jignesh K. Shah



decibel wrote:

On Mar 11, 2009, at 10:48 PM, Jignesh K. Shah wrote:
Fair enough..  Well I am now appealing  to all  who has a fairly 
decent sized hardware want to try it out  and see whether there are 
gains, no-changes or regressions based on your workload. Also 
it will help if you report number of cpus when you respond back to 
help collect feedback.



Do you have a self-contained test case? I have several boxes with 
16-cores worth of Xeon with 96GB I could try it on (though you might 
not care about having only 16 cores :P)
I dont have authority over iGen, but I am pretty sure that with sysbench 
we should be able to recreate the test case or even dbt-2
That said the patch should be pretty easy to apply to your own workloads 
(where more feedback is more appreciated ).. On x64 16 cores might bring 
out the problem faster too since typically they are 2.5X higher clock 
frequency..   Try it out.. stock build vs patched builds.



-Jignesh

--
Jignesh Shah   http://blogs.sun.com/jkshah  
The New Sun Microsystems,Inc   http://sun.com/postgresql


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Proposal of tunable fix for scalability of 8.4

2009-03-15 Thread Jignesh K. Shah



decibel wrote:

On Mar 13, 2009, at 3:02 PM, Jignesh K. Shah wrote:

vmstat seems similar to wakeup some
kthr  memorypagedisk  faults  
cpu
r b w   swap  free  re  mf pi po fr de sr s0 s1 s2 sd   in   sy   cs 
us sy id
63 0 0 45535728 38689856 0 14 0 0 0  0  0  0  0  0  0 163318 334225 
360179 47 17 36
85 0 0 45436736 38690760 0 6 0 0  0  0  0  0  0  0  0 165536 347462 
365987 47 17 36
59 0 0 45405184 38681752 0 11 0 0 0  0  0  0  0  0  0 155153 326182 
345527 47 16 37
53 0 0 45393816 38673344 0 6 0 0  0  0  0  0  0  0  0 152752 317851 
340737 47 16 37
66 0 0 45378312 38651920 0 11 0 0 0  0  0  0  0  0  0 150979 304350 
336915 47 16 38
67 0 0 45489520 38639664 0 5 0 0  0  0  0  0  0  0  0 157188 318958 
351905 47 16 37
82 0 0 45483600 38633344 0 10 0 0 0  0  0  0  0  0  0 168797 348619 
375827 47 17 36
68 0 0 45463008 38614432 0 9 0 0  0  0  0  0  0  0  0 173020 376594 
385370 47 18 35
54 0 0 45451376 38603792 0 13 0 0 0  0  0  0  0  0  0 161891 342522 
364286 48 17 35
41 0 0 45356544 38605976 0 5 0 0  0  0  0  0  0  0  0 167250 358320 
372469 47 17 36
27 0 0 45323472 38596952 0 11 0 0 0  0  0  0  0  0  0 165099 344695 
364256 48 17 35



The good news is there's now at least enough runnable procs. What I 
find *extremely* odd is the CPU usage is almost dead constant...
Generally when there is dead constant.. signs of classic bottleneck ;-)  
We will be fixing one to get to another.. but knocking bottlenecks is 
the name of the game I think


-Jignesh

--
Jignesh Shah   http://blogs.sun.com/jkshah  
The New Sun Microsystems,Inc   http://sun.com/postgresql


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Proposal of tunable fix for scalability of 8.4

2009-03-13 Thread Jignesh K. Shah



Scott Carey wrote:

On 3/12/09 11:37 AM, Jignesh K. Shah j.k.s...@sun.com wrote:


And again this is the third time I am saying.. the test users also
have some latency build up in them which is what generally is
exploited to get more users than number of CPUS on the system but
that's the point we want to exploit.. Otherwise if all new users
begin to do their job with no latency then we would need 6+
billion cpus to handle all possible users. Typically as an
administrator (System and database) I can only tweak/control
latencies within my domain, that is network, disk, cpu's etc and
those are what I am tweaking and coming to a *Configured*
environment and now trying to improve lock contentions/waits in
PostgreSQL so that we have an optimized setup.

In general, I suggest that it is useful to run tests with a few 
different types of pacing. Zero delay pacing will not have realistic 
number of connections, but will expose bottlenecks that are universal, 
and less controversial. Small latency (100ms to 1s) tests are easy to 
make from the zero delay ones, and help expose problems with 
connection count or other forms of ‘non-active’ concurrency. End-user 
realistic delays are app specific, and useful with larger holistic 
load tests (say, through the application interface). Generally, 
running them in this order helps because at each stage you are adding 
complexity. Based on your explanations, you’ve probably done much of 
this so far and your approach sounds solid to me.
If the first case fails (zero delay, smaller user count), there is no 
way the others will pass.





I think I have done that before so I can do that again by running the 
users at 0 think time which will represent a Connection pool which is 
highly utilized and test how big the connection pool can be before the 
throughput tanks.. This can be useful for App Servers which sets up 
connections pools of their own talking with PostgreSQL.


-Jignesh


--
Jignesh Shah   http://blogs.sun.com/jkshah  
The New Sun Microsystems,Inc   http://sun.com/postgresql


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


8.4 Performance improvements: was Re: [PERFORM] Proposal of tunable fix for scalability of 8.4

2009-03-13 Thread Jignesh K. Shah



Greg Smith wrote:

On Thu, 12 Mar 2009, Jignesh K. Shah wrote:

As soon as I get more cycles I will try variations of it but it 
would help if others can try it out in their own environments to see 
if it helps their instances.


What you should do next is see whether you can remove the bottleneck 
your test is running into via using a connection pooler.  That's what 
I think most informed people would do were you to ask how to setup an 
optimal environment using PostgreSQL that aimed to serve thousands of 
clients. If that makes your bottleneck go away, that's what you should 
be recommending to customers who want to scale in this fashion too.  
If the bottleneck moves to somewhere else, that new hot spot might be 
one people care more about.  Given that there are multiple good 
pooling solutions floating around already, it's hard to justify 
dumping coding and testing resources here if that makes the problem 
move somewhere else.


It's great that you've identified an alternate scheduling approach 
that helps on your problematic test case, but you're a long ways from 
having a full model of how changes to the locking model impact other 
database workloads.  As for the idea of doing something in this area 
for 8.4, there are a significant number of performance-related changes 
already committed for that version that deserve more focused testing 
during beta.  You're way too late to throw another one into that 
already crowded area.




On the other hand I have taken up a task of showing 8.4 Performance 
improvements over 8.3.
Can we do a vote on which specific performance features we want to test? 
I can use dbt2, dbt3 tests to see how 8.4 performs and compare it with 
8.3? Also if you have your own favorite test to test it out let me 
know.. I have allocated some time for this task so it is feasible for me 
to do this.


Many of the improvements may not be visible through this standard tests 
so feedback on testing methology for those is also appreciated.
* Visibility map - Reduce Vacuum overhead - (I think I can time vacuum 
with some usage on both databases)
* Prefetch IO with posix_fadvice () - Though I am not sure if it is 
supported on UNIX or not (but can be tested by standard tests)

* Parallel pg_restore (Can be tested with a big database dump)

Any more features that I can stress during the testing phase?



Regards,
Jignesh


--
* Greg Smith gsm...@gregsmith.com http://www.gregsmith.com Baltimore, MD


--
Jignesh Shah   http://blogs.sun.com/jkshah  
The New Sun Microsystems,Inc   http://sun.com/postgresql


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Proposal of tunable fix for scalability of 8.4

2009-03-13 Thread Jignesh K. Shah



Gregory Stark wrote:

Jignesh K. Shah j.k.s...@sun.com writes:

  

Scott Carey wrote:


On 3/12/09 11:37 AM, Jignesh K. Shah j.k.s...@sun.com wrote:

In general, I suggest that it is useful to run tests with a few different
types of pacing. Zero delay pacing will not have realistic number of
connections, but will expose bottlenecks that are universal, and less
controversial
  

I think I have done that before so I can do that again by running the users at
0 think time which will represent a Connection pool which is highly utilized
and test how big the connection pool can be before the throughput tanks.. This
can be useful for App Servers which sets up connections pools of their own
talking with PostgreSQL.



Keep in mind when you do this that it's not interesting to test a number of
connections much larger than the number of processors you have. Once the
system reaches 100% cpu usage it would be a misconfigured connection pooler
that kept more than that number of connections open.

  


Greg, Unfortuately the problem is that..  I am trying to reach 100% CPU which I 
cannot and hence I am increasing the user count :-)

-Jignesh



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Proposal of tunable fix for scalability of 8.4

2009-03-13 Thread Jignesh K. Shah





In general, I suggest that it is useful to run tests with a few 
different types of pacing. Zero delay pacing will not have realistic 
number of connections, but will expose bottlenecks that are 
universal, and less controversial. Small latency (100ms to 1s) tests 
are easy to make from the zero delay ones, and help expose problems 
with connection count or other forms of ‘non-active’ concurrency. 
End-user realistic delays are app specific, and useful with larger 
holistic load tests (say, through the application interface). 
Generally, running them in this order helps because at each stage you 
are adding complexity. Based on your explanations, you’ve probably 
done much of this so far and your approach sounds solid to me.
If the first case fails (zero delay, smaller user count), there is no 
way the others will pass.





I think I have done that before so I can do that again by running the 
users at 0 think time which will represent a Connection pool which 
is highly utilized and test how big the connection pool can be before 
the throughput tanks.. This can be useful for App Servers which sets 
up connections pools of their own talking with PostgreSQL.


-Jignesh


So I backed out my change and used the stock 8.4 snapshot that I had 
downloaded.. With now 0 think time  I do runs with lot less users.. 
still I cannot get it to go to 100% CPU

60: 8: Medium Throughput: 7761.000 Avg Medium Resp: 0.004
120: 16: Medium Throughput: 16876.000 Avg Medium Resp: 0.004
180: 24: Medium Throughput: 25359.000 Avg Medium Resp: 0.004
240: 32: Medium Throughput: 33104.000 Avg Medium Resp: 0.005
300: 40: Medium Throughput: 42200.000 Avg Medium Resp: 0.005
360: 48: Medium Throughput: 49996.000 Avg Medium Resp: 0.005
420: 56: Medium Throughput: 58260.000 Avg Medium Resp: 0.005
480: 64: Medium Throughput: 66289.000 Avg Medium Resp: 0.005
540: 72: Medium Throughput: 74667.000 Avg Medium Resp: 0.005
600: 80: Medium Throughput: 82632.000 Avg Medium Resp: 0.005
660: 88: Medium Throughput: 90211.000 Avg Medium Resp: 0.006
720: 96: Medium Throughput: 98236.000 Avg Medium Resp: 0.006
780: 104: Medium Throughput: 105517.000 Avg Medium Resp: 0.006
840: 112: Medium Throughput: 112921.000 Avg Medium Resp: 0.006
900: 120: Medium Throughput: 118256.000 Avg Medium Resp: 0.007
960: 128: Medium Throughput: 126499.000 Avg Medium Resp: 0.007
1020: 136: Medium Throughput: 133354.000 Avg Medium Resp: 0.007
1080: 144: Medium Throughput: 135826.000 Avg Medium Resp: 0.008
1140: 152: Medium Throughput: 121729.000 Avg Medium Resp: 0.012
1200: 160: Medium Throughput: 130487.000 Avg Medium Resp: 0.011
1260: 168: Medium Throughput: 123368.000 Avg Medium Resp: 0.013
1320: 176: Medium Throughput: 134649.000 Avg Medium Resp: 0.012
1380: 184: Medium Throughput: 136272.000 Avg Medium Resp: 0.013


Vmstat shows that CPUS are hardly busy in the 64-cpu system (CPUS are 
reported busy when there is active process assigned to the cpu)

-bash-3.2$ vmstat 30
kthr  memorypagedisk  faults  cpu
r b w   swap  free  re  mf pi po fr de sr s0 s1 s2 sd   in   sy   cs us 
sy id
19 0 0 52691088 46220848 27 302 10 68 68 0 3 1 -0 -0 -0 13411 20762 
26854 5 3 92
0 0 0 45095664 39898296 0 455 0 0 0  0  0  0  0  0  0  698  674  295  
0  0 100
0 0 0 45040640 39867056 5 13 0 0  0  0  0  0  0  0  0 3925 4189 5721  
0  0 99
0 0 0 45038856 39864016 0 5 0  0  0  0  0  0  0  0  0 9479 8643 15205 
1  1 98
0 0 0 45037760 39862552 0 14 0 0  0  0  0  0  0  0  0 12088 9041 19890 
2 1 98
0 0 0 45035960 39860080 0 6 0  0  0  0  0  0  0  0  0 16590 11611 28351 
2 1 97
0 0 0 45034648 39858416 0 17 0 0  0  0  0  0  0  0  0 19192 13027 33218 
3 1 96
0 0 0 45032360 39855464 0 10 0 0  0  0  0  0  0  0  0 22795 16467 40392 
4 1 95
0 0 0 45030840 39853568 0 22 0 0  0  0  0  0  0  0  0 25349 18315 45178 
4 1 94
0 0 0 45027456 39849648 0 10 0 0  0  0  0  0  0  0  0 28158 22500 50804 
5 2 93
0 0 0 45000752 39832608 0 38 0 0  0  0  0  0  0  0  0 31332 25744 56751 
6 2 92
0 0 0 45010120 39836728 0 6 0  0  0  0  0  0  0  0  0 36636 29334 66505 
7 2 91
0 0 0 45017072 39838504 0 29 0 0  0  0  0  0  0  0  0 38553 32313 70915 
7 2 91
0 0 0 45011384 39833768 0 11 0 0  0  0  0  0  0  0  0 41186 35949 76275 
8 3 90
0 0 0 44890552 39826136 0 40 0 0  0  0  0  0  0  0  0 45123 44507 83665 
9 3 88
0 0 0 44882808 39822048 0 6 0  0  0  0  0  0  0  0  0 49342 53431 91783 
10 3 87
0 0 0 45003328 39825336 0 42 0 0  0  0  0  0  0  0  0 48516 42515 91135 
10 3 87
0 0 0 44999688 39821008 0 6 0  0  0  0  0  0  0  0  0 54695 48741 
102526 11 3 85

kthr  memorypagedisk  faults  cpu
r b w   swap  free  re  mf pi po fr de sr s0 s1 s2 sd   in   sy   cs us 
sy id
0 0 0 44980744 39806400 0 55 0 0  0  0  0  0  0  0  0 54968 51946 
103245 12 4 84
0 0 0 44992288 39812256 0 6 0  1  1  0  0  0  0  0  0 60506 58205 
113911 13 4 83
0 0 0 44875648 39802128 1 60 0 0  0  0  0  1  0  0  0 60485 66576 
114081 13 4 83
0 0 0 44848792 39795008 0 8 0  0  0  0  

Re: [PERFORM] Proposal of tunable fix for scalability of 8.4

2009-03-13 Thread Jignesh K. Shah



Now with a modified Fix (not the original one that I proposed but 
something that works like a heart valve : Opens and shuts to minimum 
default way thus  controlling how many waiters are waked up )


Time:Users:throughput: Reponse
60: 8: Medium Throughput: 7774.000 Avg Medium Resp: 0.004
120: 16: Medium Throughput: 16874.000 Avg Medium Resp: 0.004
180: 24: Medium Throughput: 25159.000 Avg Medium Resp: 0.004
240: 32: Medium Throughput: 33216.000 Avg Medium Resp: 0.005
300: 40: Medium Throughput: 42418.000 Avg Medium Resp: 0.005
360: 48: Medium Throughput: 49655.000 Avg Medium Resp: 0.005
420: 56: Medium Throughput: 58149.000 Avg Medium Resp: 0.005
480: 64: Medium Throughput: 66558.000 Avg Medium Resp: 0.005
540: 72: Medium Throughput: 74474.000 Avg Medium Resp: 0.005
600: 80: Medium Throughput: 82241.000 Avg Medium Resp: 0.005
660: 88: Medium Throughput: 90336.000 Avg Medium Resp: 0.005
720: 96: Medium Throughput: 99101.000 Avg Medium Resp: 0.006
780: 104: Medium Throughput: 106028.000 Avg Medium Resp: 0.006
840: 112: Medium Throughput: 113196.000 Avg Medium Resp: 0.006
900: 120: Medium Throughput: 119174.000 Avg Medium Resp: 0.006
960: 128: Medium Throughput: 129408.000 Avg Medium Resp: 0.006
1020: 136: Medium Throughput: 134433.000 Avg Medium Resp: 0.007
1080: 144: Medium Throughput: 143121.000 Avg Medium Resp: 0.007
1140: 152: Medium Throughput: 144603.000 Avg Medium Resp: 0.007
1200: 160: Medium Throughput: 148604.000 Avg Medium Resp: 0.008
1260: 168: Medium Throughput: 150274.000 Avg Medium Resp: 0.009
1320: 176: Medium Throughput: 150581.000 Avg Medium Resp: 0.010
1380: 184: Medium Throughput: 146912.000 Avg Medium Resp: 0.012
1440: 192: Medium Throughput: 143945.000 Avg Medium Resp: 0.013
1500: 200: Medium Throughput: 144029.000 Avg Medium Resp: 0.015
1560: 208: Medium Throughput: 143468.000 Avg Medium Resp: 0.016
1620: 216: Medium Throughput: 144367.000 Avg Medium Resp: 0.017
1680: 224: Medium Throughput: 148340.000 Avg Medium Resp: 0.017
1740: 232: Medium Throughput: 148842.000 Avg Medium Resp: 0.018
1800: 240: Medium Throughput: 149533.000 Avg Medium Resp: 0.019
1860: 248: Medium Throughput: 152334.000 Avg Medium Resp: 0.019
1920: 256: Medium Throughput: 151521.000 Avg Medium Resp: 0.020
1980: 264: Medium Throughput: 148961.000 Avg Medium Resp: 0.022
2040: 272: Medium Throughput: 151270.000 Avg Medium Resp: 0.022
2100: 280: Medium Throughput: 149783.000 Avg Medium Resp: 0.024
2160: 288: Medium Throughput: 151743.000 Avg Medium Resp: 0.024
2220: 296: Medium Throughput: 155190.000 Avg Medium Resp: 0.026
2280: 304: Medium Throughput: 150955.000 Avg Medium Resp: 0.027
2340: 312: Medium Throughput: 147118.000 Avg Medium Resp: 0.029
2400: 320: Medium Throughput: 152768.000 Avg Medium Resp: 0.029
2460: 328: Medium Throughput: 161044.000 Avg Medium Resp: 0.028
2520: 336: Medium Throughput: 157926.000 Avg Medium Resp: 0.029
2580: 344: Medium Throughput: 161005.000 Avg Medium Resp: 0.029
2640: 352: Medium Throughput: 167274.000 Avg Medium Resp: 0.029
2700: 360: Medium Throughput: 168253.000 Avg Medium Resp: 0.031


With final vmstats improving but still far from 100%
kthr  memorypagedisk  faults  cpu
r b w   swap  free  re  mf pi po fr de sr s0 s1 s2 sd   in   sy   cs us 
sy id
38 0 0 46052840 39345096 0 11 0 0 0  0  0  0  0  0  0 134137 290703 
303518 40 14 45
43 0 0 45656456 38882912 23 77 0 0 0 0  0  0  0  0  0 135820 272899 
300749 40 15 45
38 0 0 45650488 38816984 23 80 0 0 0 0  0  0  0  0  0 135009 272767 
300192 39 15 46
47 0 0 46020792 39187688 0 5 0 0  0  0  0  0  0  0  0 140473 285445 
312826 40 14 46
24 0 0 46143984 39326848 9 61 0 0 0  0  0  0  0  0  0 146194 308590 
328241 40 15 45
37 0 0 45465256 38757000 22 74 0 0 0 0  0  0  0  0  0 136835 293971 
301433 38 14 48
35 0 0 46017544 39308072 12 61 0 0 0 0  0  0  0  0  0 142749 312355 
320592 42 15 43
36 0 0 45456000 38744688 11 24 0 0 0 0  0  0  0  0  0 143566 303461 
317683 41 15 43
23 0 0 46007408 39291312 2 22 0 0 0  0  0  0  0  0  0 140246 300061 
316663 42 15 43
20 0 0 46029656 39281704 10 25 0 0 0 0  0  0  0  0  0 147787 291825 
326387 43 15 42
24 0 0 46131016 39288528 2 21 0 0 0  0  0  0  0  0  0 150796 310697 
335791 43 15 42
20 0 0 46109448 39269392 16 67 0 0 0 0  0  0  0  0  0 150075 315517 
332881 43 16 41
30 0 0 45540928 38710376 9 27 0 0 0  0  0  0  0  0  0 155214 316448 
341472 43 16 40
14 0 0 45987496 39270016 0 5 0 0  0  0  0  0  0  0  0 155028 333711 
344207 44 16 40
25 0 0 45981136 39263008 0 10 0 0 0  0  0  0  0  0  0 153968 327343 
343776 45 16 39
54 0 0 46062984 39259936 0 7 0 0  0  0  0  0  0  0  0 153721 315839 
344732 45 16 39
42 0 0 46099704 39252920 0 15 0 0 0  0  0  0  0  0  0 154629 323125 
348798 45 16 39
54 0 0 46068944 39230808 0 8 0 0  0  0  0  0  0  0  0 157166 340265 
354135 46 17 37


But the real winner shows up in lockstat where it seems to indicate that 
stress on Waiting from ProcArrayLock is relieved (thought shifting 
somewhere else which is how lock works):


# 

Re: [PERFORM] Proposal of tunable fix for scalability of 8.4

2009-03-13 Thread Jignesh K. Shah



Scott Carey wrote:

On 3/13/09 8:55 AM, Kevin Grittner kevin.gritt...@wicourts.gov wrote:

 Jignesh K. Shah j.k.s...@sun.com wrote:
 usr sys wt idl sze
 38 11 0 50 64

The fact that you're maxing out at 50% CPU utilization has me
wondering -- are there really 64 CPUs here, or are there 32 CPUs with
hyperthreading technology (or something conceptually similar)?

-Kevin

Its a sun T1000 or T2000 type box, which are 4 threads per processor 
core IIRC. Its in his first post:


“
UltraSPARC T2 based 1 socket (64 threads) and 2 socket (128 threads)
servers that Sun sells.
“

These processors use an in-order execution engine and fill the bubbles 
in the pipelines with SMT (the non-marketing name for hyperthreading).
They are rather efficient at it though, moreso than Intel’s first stab 
at it. And Intel’s next generation chips hitting the streets in 
servers in less than a month, have it again. 



This are UltraSPARC T2 Plus which is 8 threads per core(ala CMT for us) 
.. Though the CPU% reported by vmstat is more based on scheduled in 
execution rather than what is executed by computing engine of the the 
core.. So unless you have scheduled in execution 100% on the thread, it 
wont be executing ..
So if you want to read mpstat right, you may not be executing everything 
that is shown as executing but you are definitely NOT going to execute 
anything that is not shown as executing.. My goal is to reach a level 
where we can show PostgreSQL can effectively get to 100% CPU in say 
vmstat,mpstat first...


-Jignesh

--
Jignesh Shah   http://blogs.sun.com/jkshah  
The New Sun Microsystems,Inc   http://sun.com/postgresql


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Proposal of tunable fix for scalability of 8.4

2009-03-13 Thread Jignesh K. Shah
Somebody else asked a question: This is actually a two socket machine 
(128) threads but one socket is disabled by the OS so only 64-threads 
are available... The idea being let me choke one socket first with 100% 
CPU ..
Forgot some data: with the second test above, CPU: 48% user, 18% sys, 
35% idle. CPU increased from 46% used in the first test to 65% used, 
the corresponding throughput increase was not as large, but that is 
expected on an 8-threads per core server since memory bandwidth and 
cache resources at a minimum are shared and only trivial tasks can 
scale 100%.


Based on the above, I would guess that attaining closer to 100% 
utilization (its hard to get past 90% with that many cores no matter 
what), will probablyl give another 10 to 15% improvement at most, to 
maybe 18/min throughput.


Its also rather interesting that the 2000 connection case with wait 
times gets 17/min throughput and beats the 328 users with 0 delay 
result above. I suspect the ‘wake all’ version is just faster. I would 
love to see a ‘wake all shared, leave exclusives at front of queue’ 
version, since that would not allow lock starvation. 
Considering that there is one link list it is just easier to wake the 
sequential selected few or wake them all up.. If I go through the list 
trying to wake all the shared ones then I essentially need to have 
another link list to collect all the exclusives ...


I will retry the thundering herd of waking all waiters irrespective of 
shared, exclusive and see how that behaves.. I think the biggest benefit 
is when the process is waked up and the process in reality is already on 
the cpu checking the field to see whether last guy who released the lock 
is allowing him to wake up or not.


Still I will try some more experiments.. Definitely reducing time in 
Waiting lock waits benefits and making Acquired times more efficient 
results in more tpm per user.


I will try another run with plain wake up all and see with the same 
parameters (0 think time) that test behaves..


-Jignesh

--
Jignesh Shah   http://blogs.sun.com/jkshah  
The New Sun Microsystems,Inc   http://sun.com/postgresql


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Proposal of tunable fix for scalability of 8.4

2009-03-13 Thread Jignesh K. Shah
   ExclusiveAcquired 116
ProcArrayLock   ExclusiveAcquired 117
  CLogControlLock   ExclusiveAcquired 176
WALInsertLock   ExclusiveAcquired 370
 FirstLockMgrLock  SharedAcquired 793
  FirstBufMappingLock  SharedAcquired 799
ProcArrayLock  SharedAcquired 882
   SInvalReadLock  SharedAcquired1827

  Lock IdMode   State   Combined Time (ns)
WALInsertLockAcquired52915
  CLogControlLockAcquired78332
   XidGenLockAcquired   103026
 FirstLockMgrLockAcquired   392836
  FirstBufMappingLockAcquired  2919896
  CLogControlLock  Shared Waiting  5342211
  CLogControlLock   Exclusive Waiting  9172692
ProcArrayLock  Shared Waiting 18186546
ProcArrayLockAcquired 22478607
   XidGenLock   ExclusiveAcquired 26561444
   SInvalReadLockAcquired 29012891
  CLogControlLock   ExclusiveAcquired 30490159
WALInsertLock   Exclusive Waiting 35055294
 FirstLockMgrLock   ExclusiveAcquired 47077668
  FirstBufMappingLock   ExclusiveAcquired 47460381
WALInsertLock   ExclusiveAcquired 99288648
ProcArrayLock   Exclusive Waiting104221100
ProcArrayLock   ExclusiveAcquired356644807
   SInvalReadLock   ExclusiveAcquired357530794



So clearly even waking up some more exclusives than just 1 seems to help 
scalability improve (though actual improvement mileage varies but there 
is some positive improvement).


One more change that I can think of doing is a minor change where we 
wake all sequential shared waiters but only 1 exclusive waiter.. I am 
going to change that to ... whatever sequential you get wake them all 
up.. so in essense it does a similar heart valve type approach of doing 
little bursts rather than tie them to 1 exclusive only.



-Jignesh


Jignesh K. Shah wrote:



Now with a modified Fix (not the original one that I proposed but 
something that works like a heart valve : Opens and shuts to minimum 
default way thus  controlling how many waiters are waked up )


Time:Users:throughput: Reponse
60: 8: Medium Throughput: 7774.000 Avg Medium Resp: 0.004
120: 16: Medium Throughput: 16874.000 Avg Medium Resp: 0.004
180: 24: Medium Throughput: 25159.000 Avg Medium Resp: 0.004
240: 32: Medium Throughput: 33216.000 Avg Medium Resp: 0.005
300: 40: Medium Throughput: 42418.000 Avg Medium Resp: 0.005
360: 48: Medium Throughput: 49655.000 Avg Medium Resp: 0.005
420: 56: Medium Throughput: 58149.000 Avg Medium Resp: 0.005
480: 64: Medium Throughput: 66558.000 Avg Medium Resp: 0.005
540: 72: Medium Throughput: 74474.000 Avg Medium Resp: 0.005
600: 80: Medium Throughput: 82241.000 Avg Medium Resp: 0.005
660: 88: Medium Throughput: 90336.000 Avg Medium Resp: 0.005
720: 96: Medium Throughput: 99101.000 Avg Medium Resp: 0.006
780: 104: Medium Throughput: 106028.000 Avg Medium Resp: 0.006
840: 112: Medium Throughput: 113196.000 Avg Medium Resp: 0.006
900: 120: Medium Throughput: 119174.000 Avg Medium Resp: 0.006
960: 128: Medium Throughput: 129408.000 Avg Medium Resp: 0.006
1020: 136: Medium Throughput: 134433.000 Avg Medium Resp: 0.007
1080: 144: Medium Throughput: 143121.000 Avg Medium Resp: 0.007
1140: 152: Medium Throughput: 144603.000 Avg Medium Resp: 0.007
1200: 160: Medium Throughput: 148604.000 Avg Medium Resp: 0.008
1260: 168: Medium Throughput: 150274.000 Avg Medium Resp: 0.009
1320: 176: Medium Throughput: 150581.000 Avg Medium Resp: 0.010
1380: 184: Medium Throughput: 146912.000 Avg Medium Resp: 0.012
1440: 192: Medium Throughput: 143945.000 Avg Medium Resp: 0.013
1500: 200: Medium Throughput: 144029.000 Avg Medium Resp: 0.015
1560: 208: Medium Throughput: 143468.000 Avg Medium Resp: 0.016
1620: 216: Medium Throughput: 144367.000 Avg Medium Resp: 0.017
1680: 224: Medium Throughput: 148340.000 Avg Medium Resp: 0.017
1740: 232: Medium Throughput: 148842.000 Avg Medium Resp: 0.018
1800: 240: Medium Throughput: 149533.000 Avg Medium Resp: 0.019
1860: 248: Medium Throughput: 152334.000 Avg Medium Resp: 0.019
1920: 256: Medium Throughput: 151521.000 Avg Medium Resp: 0.020
1980: 264: Medium Throughput: 148961.000 Avg Medium Resp: 0.022
2040: 272: Medium Throughput: 151270.000 Avg Medium Resp: 0.022
2100: 280: Medium Throughput: 149783.000

Re: [PERFORM] Proposal of tunable fix for scalability of 8.4

2009-03-12 Thread Jignesh K. Shah



On 03/11/09 22:01, Scott Carey wrote:

On 3/11/09 3:27 PM, Kevin Grittner kevin.gritt...@wicourts.gov wrote:


I'm a lot more interested in what's happening between 60 and 180 than
over 1000, personally.  If there was a RAID involved, I'd put it down
to better use of the numerous spindles, but when it's all in RAM it
makes no sense.

If there is enough lock contention and a common lock case is a short 
lived shared lock, it makes perfect sense sense.  Fewer readers are 
blocked waiting on writers at any given time.  Readers can 'cut' in 
line ahead of writers within a certain scope (only up to the number 
waiting at the time a shared lock is at the head of the queue). 
 Essentially this clumps up shared and exclusive locks into larger 
streaks, and allows for higher shared lock throughput.  
Exclusive locks may be delayed, but will NOT be starved, since on the 
next iteration, a streak of exclusive locks will occur first in the 
list and they will all process before any more shared locks can go.


This will even help in on a single CPU system if it is read dominated, 
lowering read latency and slightly increasing write latency.


If you want to make this more fair, instead of freeing all shared 
locks, limit the count to some number, such as the number of CPU 
cores.  Perhaps rather than wake-up-all-waiters=true, the parameter 
can be an integer representing how many shared locks can be freed at 
once if an exclusive lock is encountered.



Well I am waking up not just shared but shared and exclusives.. However 
i like your idea of waking up the next N waiters where N matches the 
number of cpus available.  In my case it is 64 so yes this works well 
since the idea being of all the 64 waiters running right now one will be 
able to lock the next lock  immediately and hence there are no cycles 
wasted where nobody gets a lock which is often the case when you say 
wake up only 1 waiter and hope that the process is on the CPU (which in 
my case it is 64 processes) and it is able to acquire the lock.. The 
probability of acquiring the lock within the next few cycles is much 
less for only 1 waiter  than giving chance to 64 such processes  and 
then let them fight based on who is already on CPU  and acquire the 
lock. That way the period where nobody has a lock is reduced and that 
helps to cut out artifact  idle time on the system.



As soon as I get more cycles I will try variations of it but it would 
help if others can try it out in their own environments to see if it 
helps their instances.



-Jignesh



Re: [PERFORM] Proposal of tunable fix for scalability of 8.4

2009-03-12 Thread Jignesh K. Shah



On 03/12/09 11:13, Kevin Grittner wrote:
Scott Carey sc...@richrelevance.com wrote: 


Kevin Grittner kevin.gritt...@wicourts.gov wrote:



I'm a lot more interested in what's happening between 60 and 180
than over 1000, personally.  If there was a RAID involved, I'd put
it down to better use of the numerous spindles, but when it's all
in RAM it makes no sense.
  

If there is enough lock contention and a common lock case is a short
lived shared lock, it makes perfect sense sense.  Fewer readers are
blocked waiting on writers at any given time.  Readers can 'cut' in
line ahead of writers within a certain scope (only up to the number
waiting at the time a shared lock is at the head of the queue). 
Essentially this clumps up shared and exclusive locks into larger

streaks, and allows for higher shared lock throughput.

 
You misunderstood me.  I wasn't addressing the affects of his change,

but rather the fact that his test shows a linear improvement in TPS up
to 1000 connections for a 64 thread machine which is dealing entirely
with RAM -- no disk access.  Where's the bottleneck that allows this
to happen?  Without understanding that, his results are meaningless.
 
-Kevin


  


Every user has a think time (200ms) to wait before doing the next 
transaction which results in idle time and theoretically allows other 
users to run in between ..


-Jignesh



Re: [PERFORM] Proposal of tunable fix for scalability of 8.4

2009-03-12 Thread Jignesh K. Shah



On 03/12/09 13:48, Scott Carey wrote:

On 3/11/09 7:47 PM, Tom Lane t...@sss.pgh.pa.us wrote:

All I'm adding, is that it makes some sense to me based on my 
experience in CPU / RAM bound scalability tuning.  It was expressed 
that the test itself didn't even make sense.


I was wrong in my understanding of what the change did.  If it wakes 
ALL waiters up there is an indeterminate amount of time a lock will wait.
However, if instead of waking up all of them, if it only wakes up the 
shared readers and leaves all the exclusive ones at the front of the 
queue, there is no possibility of starvation since those exclusives 
will be at the front of the line after the wake-up batch.


As for this being a use case that is important:

*  SSDs will drive the % of use cases that are not I/O bound up 
significantly over the next couple years.  All postgres installations 
with less than about 100GB of data TODAY could avoid being I/O bound 
with current SSD technology, and those less than 2TB can do so as well 
but at high expense or with less proven technology like the ZFS L2ARC 
flash cache.
*  Intel will have a mainstream CPU that handles 12 threads (6 cores, 
2 threads each) at the end of this year.  Mainstream two CPU systems 
will have access to 24 threads and be common in 2010.  Higher end 4CPU 
boxes will have access to 48 CPU threads.  Hardware thread count is 
only going up.  This is the future.




SSDs are precisely my motivation of doing RAM based tests with 
PostgreSQL. While I am waiting for my SSDs to arrive, I started to 
emulate SSDs by putting the whole database on RAM which in sense are 
better than SSDs so if we can tune with RAM disks then SSDs will be covered.


What we have is a pool of 2000 users and we start making each user do 
series of transactions on different rows and see how much the database 
can handle linearly before some bottleneck (system or database) kicks in 
and there can be no more linear increase in active users. Many times 
there is drop after reaching some value of active users. If all 2000 
users can scale linearly then another test with say 2500 can be executed 
.. All to do is what's the limit we can go till typically there are no 
system resources still remaining to be exploited.


That said the testkit that I am using is a lightweight OLTP typish 
workload which a user runs against a preknown schema and between various 
transactions that it does it emulates a wait time of 200ms. That said it 
is some sense emulating a real user who clicks and then waits to see 
what he got and does another click which results in another transaction 
happening.  (Not exactly but you get the point).  Like all workloads it 
is generally used to find bottlenecks in systems before putting 
production stuff on it.



That said my current environment I am having similar workloads and 
seeing how many users can go to the point where system has no more CPU 
resources available to do a linear growth in tpm. Generally as many of 
you  mentioned you will see disk latency, network latency, cpu resource 
problems, etc.. And thats the work I am doing right now.. I am working 
around network latency by doing a private network, improving Operating 
systems tunables to improve efficiency out there.. I am improving disk 
latency by putting them on /RAM (and soon on SSDs).. However if I still 
cannot consume all CPU then it means I am probably hit by locks . Using 
PostgreSQL DTrace probes I can see what's happening..


At low user (100 users) counts my lock profiles from a user point of 
view are as follows:



# dtrace -q -s 84_lwlock.d 1764

 Lock IdMode   State   Count
   ProcArrayLock  Shared Waiting   1
 CLogControlLock  SharedAcquired   2
   ProcArrayLock   Exclusive Waiting   3
   ProcArrayLock   ExclusiveAcquired  24
  XidGenLock   ExclusiveAcquired  24
FirstLockMgrLock  SharedAcquired  25
 CLogControlLock   ExclusiveAcquired  26
 FirstBufMappingLock  SharedAcquired  55
   WALInsertLock   ExclusiveAcquired  75
   ProcArrayLock  SharedAcquired 178
  SInvalReadLock  SharedAcquired 378

 Lock IdMode   State   Combined Time (ns)
  SInvalReadLockAcquired29849
   ProcArrayLock  Shared Waiting92261
   ProcArrayLockAcquired   951470
FirstLockMgrLock   ExclusiveAcquired  1069064
 CLogControlLock   ExclusiveAcquired  1295551
   ProcArrayLock   Exclusive Waiting  1758033
 FirstBufMappingLock   Exclusive  

Re: [PERFORM] Proposal of tunable fix for scalability of 8.4

2009-03-12 Thread Jignesh K. Shah



On 03/12/09 15:10, Alvaro Herrera wrote:

Tom Lane wrote:
  

Scott Carey sc...@richrelevance.com writes:


They are not meaningless.  It is certainly more to understand, but the test is 
entirely valid without that.  In a CPU bound / RAM bound case, as concurrency 
increases you look for the throughput trend, the %CPU use trend and the context 
switch rate trend.  More information would be useful but the test is validated 
by the evidence that it is held up by lock contention.
  

Er ... *what* evidence?  There might be evidence somewhere that proves
that, but Jignesh hasn't shown it.  The available data suggests that the
first-order performance limiter in this test is something else.
Otherwise it should be possible to max out the performance with a lot
less than 1000 active backends.



With 200ms of think times as Jignesh just said, 1000 users does not
equate 1000 active backends.  (It's probably closer to 100 backends,
given an avg. response time of ~20ms)

Something that might be useful for him to report is the avg number of
active backends for each data point ...
  
short of doing select * from pg_stat_activity and removing the IDLE 
entries, any other clean way to get that information. If there is no 
other latency then active backends should be active users * 10ms/200ms 
or activeusers/20 on average. However the number is still lower than 
that since active user can still be waiting for locks which can be 
either on CPU (spin) or sleeping (proven by increase in average response 
time of execution which includes the wait).


Also till date I am primarily more interested in active backends which 
are waiting for acquiring the locks since I find making that more 
efficient gives me the biggest return on my buck.. Lower response time 
and higher throughput.


-Jignesh



[PERFORM] Proposal of tunable fix for scalability of 8.4

2009-03-11 Thread Jignesh K. Shah

Hello All,

As  you know that one of the thing that constantly that I have been 
using benchmark kits to see how we can scale PostgreSQL on the 
UltraSPARC T2 based 1 socket (64 threads) and 2 socket (128 threads) 
servers that Sun sells.


During last PgCon 2008 
http://www.pgcon.org/2008/schedule/events/72.en.html you might remember 
that I mentioned that ProcArrayLock is pretty hot when you have many users.


Rerunning similar tests on a 64-thread UltraSPARC T2plus based server 
config,  I found that even with 8.4snap that I took I was still having 
similar problems (IO is not a problem... all in RAM .. no disks):

Time:Users:Type:TPM: Response Time
60: 100: Medium Throughput: 10552.000 Avg Medium Resp: 0.006
120: 200: Medium Throughput: 22897.000 Avg Medium Resp: 0.006
180: 300: Medium Throughput: 33099.000 Avg Medium Resp: 0.009
240: 400: Medium Throughput: 44692.000 Avg Medium Resp: 0.007
300: 500: Medium Throughput: 56455.000 Avg Medium Resp: 0.007
360: 600: Medium Throughput: 67220.000 Avg Medium Resp: 0.008
420: 700: Medium Throughput: 77592.000 Avg Medium Resp: 0.009
480: 800: Medium Throughput: 87277.000 Avg Medium Resp: 0.011
540: 900: Medium Throughput: 98029.000 Avg Medium Resp: 0.012
600: 1000: Medium Throughput: 102547.000 Avg Medium Resp: 0.023
660: 1100: Medium Throughput: 100503.000 Avg Medium Resp: 0.044
720: 1200: Medium Throughput: 99506.000 Avg Medium Resp: 0.065
780: 1300: Medium Throughput: 95474.000 Avg Medium Resp: 0.089
840: 1400: Medium Throughput: 86254.000 Avg Medium Resp: 0.130
900: 1500: Medium Throughput: 91947.000 Avg Medium Resp: 0.139
960: 1600: Medium Throughput: 94838.000 Avg Medium Resp: 0.147
1020: 1700: Medium Throughput: 92446.000 Avg Medium Resp: 0.173
1080: 1800: Medium Throughput: 91032.000 Avg Medium Resp: 0.194
1140: 1900: Medium Throughput: 88236.000 Avg Medium Resp: 0.221
runDynamic: uCount =  2000delta = 1900
runDynamic: ALL Threads Have Been created
1200: 2000: Medium Throughput: -1352555.000 Avg Medium Resp: 0.071
1260: 2000: Medium Throughput: 88872.000 Avg Medium Resp: 0.238
1320: 2000: Medium Throughput: 88484.000 Avg Medium Resp: 0.248
1380: 2000: Medium Throughput: 90777.000 Avg Medium Resp: 0.231
1440: 2000: Medium Throughput: 90769.000 Avg Medium Resp: 0.229

You will notice that throughput drops around 1000 users.. Nothing new 
you have already heard me mention that zillion times..


Now while working on this today I was going through LWLockRelease like I 
have probably done quite a few times before to see what can be done.. 
The quick synopsis is that LWLockRelease releases the lock and wakes up 
the next waiter to take over and if the next waiter is waiting for 
exclusive then it only wakes that waiter up  and if next waiter is 
waiting on shared then it goes through all shared waiters following and 
wakes them all up.


Earlier last  year I had tried various ways of doing intelligent waking 
up (finding all shared together and waking them up, coming up with a 
different lock type and waking multiple of them up simultaneously but 
ended up defining a new lock mode and of course none of them were 
stellar enough to make an impack..


Today I tried something else.. Forget the distinction of exclusive and 
shared and just wake them all up so I changed the code from

   /*
   * Remove the to-be-awakened PGPROCs from the 
queue.  If the front
   * waiter wants exclusive lock, awaken him 
only. Otherwise awaken

   * as many waiters as want shared access.
   */
   proc = head;
   if (!proc-lwExclusive)
   {
  while (proc-lwWaitLink != NULL 
  !proc-lwWaitLink-lwExclusive)
  proc = proc-lwWaitLink;
   }
   /* proc is now the last PGPROC to be released */
   lock-head = proc-lwWaitLink;
   proc-lwWaitLink = NULL;
   /* prevent additional wakeups until retryer gets 
to run */

   lock-releaseOK = false;


to basically wake them all up:
   /*
* Remove the to-be-awakened PGPROCs from the queue.  If the 
front
* waiter wants exclusive lock, awaken him only. Otherwise 
awaken

* as many waiters as want shared access.
*/
   proc = head;
   //if (!proc-lwExclusive)
   if (1)
   {
while (proc-lwWaitLink != NULL 
 1)
  // 
!proc-lwWaitLink-lwExclusive)

   proc = proc-lwWaitLink;
   }
   /* proc is now the last PGPROC to be released */
   lock-head = proc-lwWaitLink;
   

Re: [PERFORM] Proposal of tunable fix for scalability of 8.4

2009-03-11 Thread Jignesh K. Shah



On 03/11/09 18:27, Kevin Grittner wrote:
Jignesh K. Shah j.k.s...@sun.com wrote: 


Rerunning similar tests on a 64-thread UltraSPARC T2plus based
server config

 
  

(IO is not a problem... all in RAM .. no disks):
Time:Users:Type:TPM: Response Time
60: 100: Medium Throughput: 10552.000 Avg Medium Resp: 0.006
120: 200: Medium Throughput: 22897.000 Avg Medium Resp: 0.006
180: 300: Medium Throughput: 33099.000 Avg Medium Resp: 0.009
240: 400: Medium Throughput: 44692.000 Avg Medium Resp: 0.007
300: 500: Medium Throughput: 56455.000 Avg Medium Resp: 0.007
360: 600: Medium Throughput: 67220.000 Avg Medium Resp: 0.008
420: 700: Medium Throughput: 77592.000 Avg Medium Resp: 0.009
480: 800: Medium Throughput: 87277.000 Avg Medium Resp: 0.011
540: 900: Medium Throughput: 98029.000 Avg Medium Resp: 0.012
600: 1000: Medium Throughput: 102547.000 Avg Medium Resp: 0.023

 
I'm wondering about the testing methodology.  If there is no I/O, I

wouldn't expect performance to improve after you have all the CPU
threads busy.  (OK, so there might be some brief blocking that would
make the optimal number of connections somewhat above 64, but 1000???)
 
What's the bottleneck which allows additional connections to improve

the throughput?  Network latency?
 
I'm a lot more interested in what's happening between 60 and 180 than

over 1000, personally.  If there was a RAID involved, I'd put it down
to better use of the numerous spindles, but when it's all in RAM it
makes no sense.
 
-Kevin
  



Kevin,

The problem is the CPUs are not all busy there is plenty of idle cycles 
since PostgreSQL ends up in situations where they are all waiting for 
lockacquires for exclusive.. In cases where there is say one cpu then 
waking up one or few waiters is more efficient.. However when you have 
64 or 128 or 256 (as in my case), waking up one waiter is inefficient 
since only one waiter will be allowed to run while other waiters will 
still wake up, spin acquire lock and say.. oh I am still not allowed and 
go back to speed..


Testing methology is considering we can get fast storage, can PostgreSQL 
still scale to use say 32, 64, 128, 256 cpus... I am just ahead of the 
curve of wide spread usage here probably but I want to make sure 
PostgreSQL is well tested already for it. And yes I still have plenty of 
unused CPU so the goal is to make sure if system can handle it, so can 
PostgreSQL.



Regards,
Jignesh



Re: [PERFORM] Proposal of tunable fix for scalability of 8.4

2009-03-11 Thread Jignesh K. Shah



Tom Lane wrote:

Kevin Grittner kevin.gritt...@wicourts.gov writes:
  

I'm wondering about the testing methodology.



Me too.  This test case seems much too far away from real world use
to justify diddling low-level locking behavior; especially a change
that is obviously likely to have very negative effects in other
scenarios.  In particular, I think it would lead to complete starvation
of would-be exclusive lockers in the face of competition from a steady
stream of shared lockers.  AFAIR the existing behavior was designed
to reduce the odds of that, not for any other purpose.

regards, tom lane

  


Hi Tom,

The test case is not that far fetched from real world.. Plus if you read 
my proposal I clearly mention a tunable for it so that we can set and 
hence obviously not impact 99% of the people who don't care about it but 
still allow the flexibility of the 1% of the people who do care about 
scalability when they go on bigger system. The fact that it is a tunable 
(and obviously not the default way) there is no impact to existing 
behavior.


My test case clearly shows that Exclusive lockers ARE benefited from it 
otherwise I would have not seen the huge impact on throughput.


A tunable does not impact existing behavior but adds flexibility for 
those using PostgreSQL on high end systems. Plus doing it the tunable 
way on PostgreSQL 8.4 will convince many people that I know to quickly 
adopt PostgreSQL 8.4 just because of the benefit it brings on systems 
with many cpus/cores/threads.


All I am requesting is for the beta to have that tunable. Its not hard, 
people can then quickly try default (off) or on or as Scott Carey 
mentioned  a more flexible of default, all or a fixed integer number 
(for people to experiment).



Regards,
Jignesh

--
Jignesh Shah   http://blogs.sun.com/jkshah  
The New Sun Microsystems,Inc   http://sun.com/postgresql


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Proposal of tunable fix for scalability of 8.4

2009-03-11 Thread Jignesh K. Shah



Tom Lane wrote:

Scott Carey sc...@richrelevance.com writes:
  

If there is enough lock contention and a common lock case is a short lived 
shared lock, it makes perfect sense sense.  Fewer readers are blocked waiting 
on writers at any given time.  Readers can 'cut' in line ahead of writers 
within a certain scope (only up to the number waiting at the time a shared lock 
is at the head of the queue).  Essentially this clumps up shared and exclusive 
locks into larger streaks, and allows for higher shared lock throughput.
Exclusive locks may be delayed, but will NOT be starved, since on the next 
iteration, a streak of exclusive locks will occur first in the list and they 
will all process before any more shared locks can go.



That's a lot of sunny assertions without any shred of evidence behind
them...

The current LWLock behavior was arrived at over multiple iterations and
is not lightly to be toyed with IMHO.  Especially not on the basis of
one benchmark that does not reflect mainstream environments.

Note that I'm not saying no.  I'm saying that I want a lot more
evidence *before* we go to the trouble of making this configurable
and asking users to test it.

regards, tom lane

  
Fair enough..  Well I am now appealing  to all  who has a fairly decent 
sized hardware want to try it out  and see whether there are gains, 
no-changes or regressions based on your workload. Also it will help 
if you report number of cpus when you respond back to help collect 
feedback.


Regards,
Jignesh

--
Jignesh Shah   http://blogs.sun.com/jkshah  
The New Sun Microsystems,Inc   http://sun.com/postgresql


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] 2GB or not 2GB

2008-05-28 Thread Jignesh K. Shah



Josh Berkus wrote:

Folks,

Subsequent to my presentation of the new annotated.conf at pgCon last week, 
there's been some argument about the utility of certain memory settings 
above 2GB.  I'd like to hash those out on this list so that we can make 
some concrete recomendations to users.


shared_buffers:  according to witnesses, Greg Smith presented at East that 
based on PostgreSQL's buffer algorithms, buffers above 2GB would not 
really receive significant use.  However, Jignesh Shah has tested that on 
workloads with large numbers of connections, allocating up to 10GB 
improves performance. 
  
I have certainly seen improvements in performance upto 10GB using 
EAStress. The delicate balance is between file system cache and shared 
buffers. I think the initial ones are more beneficial at shared buffers 
level and after that file system cache.
I am trying to remember Greg's presentation where I think he suggested 
more like 50% of available RAM (eg in 4GB system used just for 
PostgreSQL, it may not help setting more than 2GB since you need memory 
for other stuff also).. Right Greg?


But if you have 32GB RAM .. I dont mind allocating 10GB to PostgreSQL 
beyond which I find lots of other things that begin to impact..


BTW I am really +1 for just setting AvailRAM tunable for PostgreSQL 
(example that you showed in tutorials) and do default derivations for 
all other settings unless overridden manually.  So people dont forget to 
bump up wal_buffers  or one of them while bumping the rest and trying to 
fight why the hell they are not seeing what they are expecting.


-Jignesh



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] 2GB or not 2GB

2008-05-28 Thread Jignesh K. Shah



Greg Smith wrote:

On Wed, 28 May 2008, Josh Berkus wrote:

shared_buffers:  according to witnesses, Greg Smith presented at East 
that

based on PostgreSQL's buffer algorithms, buffers above 2GB would not
really receive significant use.  However, Jignesh Shah has tested 
that on

workloads with large numbers of connections, allocating up to 10GB
improves performance.


Lies!  The only upper-limit for non-Windows platforms I mentioned was 
suggesting those recent tests at Sun showed a practical limit in the 
low multi-GB range.


I've run with 4GB usefully for one of the multi-TB systems I manage, 
the main index on the most frequently used table is 420GB and anything 
I can do to keep the most popular parts of that pegged in memory seems 
to help. I haven't tried to isolate the exact improvement going from 
2GB to 4GB with benchmarks though.


Yep its always the index that seems to benefit with high cache hits.. In 
one of the recent tests what I end up doing is writing a select  
count(*) from trade where t_id = $1 and t_id  SOMEMAX just to kick in 
index scan and get it in memory first. So higher the bufferpool better 
the hit for index in it better the performance.


-Jignesh






--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Benchmarks WAS: Sun Talks about MySQL

2008-05-01 Thread Jignesh K. Shah


Joshua D. Drake wrote:

On Mon, 28 Apr 2008 14:40:25 -0400
Gregory Stark [EMAIL PROTECTED] wrote:


  

We certainly can pass TPC-C. I'm curious what you mean by 1/4 though?
On similar hardware? Or the maximum we can scale to is 1/4 as large
as Oracle? Can you point me to the actual benchmark runs you're
referring to?



I would be curious as well considering there has been zero evidence
provided to make such a statement. I am not saying it isn't true, it
wouldn't be surprising to me if Oracle outperformed PostgreSQL in TPC-C
but I would sure like to see in general how wel we do (or don't).


Sincerely,

Joshua D. Drake
  


I am sorry but I am far from catching my emails:

Best thing is to work with TPC-E benchmarks involving the community.  
(TPC-C requirements is way too high on storage and everybody seems to be 
getting on the TPC-E bandwagon slowly.)


Where can I get the latest DBT5 (TPC-E) kit ? Using the kit should allow 
me to recreate setups which can then be made available for various 
PostgreSQL Performance engineers to look at it.




Regards,
Jignesh





--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] 12 disks raid setup

2008-02-29 Thread Jignesh K. Shah



Franck Routier wrote:

Hi,

I am in the process of setting up a postgresql server with 12 SAS disks.

I am considering two options:

1) set up a 12 disks raid 10 array to get maximum raw performance from
the system and put everything on it (it the whole pg cluster, including
WAL, and every tablespcace)

2) set up 3 raid 10 arrays of 4 disks, and dispatch my data on these
disks via tablespaces :

data1 = pg cluster + references data (dimensions) tablespace
data2 = fact data tablespace
data3 = indices tablespace


  


Option 2: Infact I would also say within one of the RAID1 use another 
softpartition and separate out pg_xlog also.


My 2 cents based on my benchmarks.

-Jignesh


---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] shared_buffers in 8.3 w/ lots of RAM on dedicated PG machine

2008-02-15 Thread Jignesh K. Shah



Greg Smith wrote:

On Fri, 15 Feb 2008, Peter Schuller wrote:


Or is it a matter of PostgreSQL doing non-direct I/O, such that
anything cached in shared_buffers will also be cached by the OS?


PostgreSQL only uses direct I/O for writing to the WAL; everything 
else goes through the regular OS buffer cache unless you force it to 
do otherwise at the OS level (like some Solaris setups do with 
forcedirectio).  This is one reason it still make not make sense to 
give an extremely high percentage of RAM to PostgreSQL even with 
improvements in managing it.  Another is that shared_buffers memory 
has to be reconciled with disk at every checkpoint, where OS buffers 
do not.  A third is that your OS may just be more efficient at 
buffering--it knows more about the underlying hardware, and the 
clock-sweep method used internally by PostgreSQL to simulate a LRU 
cache is not extremely sophisticated.


However, don't feel limited by the general 25% rule; it's certainly 
worth exploring whether 50% or more works better for your workload.  
You'll have to benchmark that yourself though, and I'd suggest using 
pg_buffercache: 
http://www.postgresql.org/docs/8.3/static/pgbuffercache.html to get an 
idea just what the pages are being used for.




As per the test that I have done mostly with forcedirectio on Solaris, I 
have seen gains with increasing the buffercache to about somewhere 
between 10GB and thats when thing seem to take a turn...


So in my case I am generally comfortable for Postgres to use about 
8-10GB beyond which I am cautious.


Also with tests with UFS buffered for table/index and forcedirectio it 
seems to perform better with forcedirectio .. However if you do want to 
exploit the extra RAM with UFS then you have to do some tunings for UFS 
in Solaris.. Now with machines with 32GB becoming common this is 
something worth pursuing depending on the storage if it can handle the 
directio load or not.



Regards,
Jignesh




---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [PERFORM] Benchmark Data requested --- pgloader CE design ideas

2008-02-06 Thread Jignesh K. Shah



Greg Smith wrote:

On Wed, 6 Feb 2008, Simon Riggs wrote:


For me, it would be good to see a --parallel=n parameter that would
allow pg_loader to distribute rows in round-robin manner to n
different concurrent COPY statements. i.e. a non-routing version.


Let me expand on this.  In many of these giant COPY situations the 
bottleneck is plain old sequential I/O to a single process.  You can 
almost predict how fast the rows will load using dd.  Having a process 
that pulls rows in and distributes them round-robin is good, but it 
won't crack that bottleneck.  The useful approaches I've seen for 
other databases all presume that the data files involved are large 
enough that on big hardware, you can start multiple processes running 
at different points in the file and beat anything possible with a 
single reader.


If I'm loading a TB file, odds are good I can split that into 4 or 
more vertical pieces (say rows 1-25%, 25-50%, 50-75%, 75-100%), start 
4 loaders at once, and get way more than 1 disk worth of throughput 
reading.  You have to play with the exact number because if you push 
the split too far you introduce seek slowdown instead of improvements, 
but that's the basic design I'd like to see one day.  It's not 
parallel loading that's useful for the cases I'm thinking about until 
something like this comes around.




Some food for thought here: Most BI Type applications which does data 
conversions/cleansing also might end up sorting the data before its 
loaded into a database so starting parallel loaders at Total different 
points ruins that effort. A More pragmatic approach will be to read the 
next rows from the input file So if there are N parallel streams then 
each one is offset by 1 from each other and jumps by N rows so the seeks 
are pretty much narrrowed down to few rows (ideally 1) instead of 
jumping 1/Nth rows every time  a read happens.


For example to replicate this with dd to see the impact use a big file 
and use the seek option and blocksizes .. Somebody out here once had 
done that test and showed that seek time on the file being read is 
reduced significantly and depending on the file system it does 
intelligent prefetching (which unfortunately UFS in Solaris does not do 
best by default)  all the reads for the next stream will already be in 
memory.




Regards,
Jignesh


---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] Benchmark Data requested

2008-02-05 Thread Jignesh K. Shah
One of the problems with Empty Table optimization is that if there are 
indexes created then it is considered as no longer empty.


Commercial  databases have options like IRRECOVERABLE clause along 
with DISK PARTITIONS and CPU partitions for their bulk loaders.


So one option turns off logging, disk partitions create multiple 
processes to read various lines/blocks from input file and other various 
blocks to clean up the bufferpools to disk and CPU partitions to process 
the various blocks/lines read for their format and put the rows in 
bufferpool if successful.


Regards,
Jignesh

Simon Riggs wrote:

On Tue, 2008-02-05 at 15:05 +, Richard Huxton wrote:

  

Only by locking the table, which serializes access, which then slows you
down or at least restricts other options. Plus if you use pg_loader then
you'll find only the first few rows optimized and all the rest not.
  
Hmm - the table-locking requirement is true enough, but why would 
pg_loader cause problems after the first few rows?



It runs a stream of COPY statements, so only first would be optimized
with the empty table optimization.

  


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [PERFORM] Benchmark Data requested

2008-02-05 Thread Jignesh K. Shah

Hi Heikki,

Is there a way such an operation can be spawned as a worker process? 
Generally during such loading - which most people will do during 
offpeak hours I expect additional CPU resources available. By 
delegating such additional work to worker processes, we should be able 
to capitalize on additional cores in the system.


Even if it is a single core, the mere fact that the loading process will 
eventually wait for a read from the input file which cannot be 
non-blocking, the OS can timeslice it well for the second process to use 
those wait times for the index population work.


What do you think?


Regards,
Jignesh


Heikki Linnakangas wrote:

Dimitri Fontaine wrote:

Le mardi 05 février 2008, Simon Riggs a écrit :

I'll look at COPY FROM internals to make this faster. I'm looking at
this now to refresh my memory; I already had some plans on the shelf.


Maybe stealing some ideas from pg_bulkload could somewhat help here?
  
http://pgfoundry.org/docman/view.php/1000261/456/20060709_pg_bulkload.pdf 



IIRC it's mainly about how to optimize index updating while loading 
data, and I've heard complaints on the line this external tool has 
to know too much about PostgreSQL internals to be trustworthy as 
non-core code... so...


I've been thinking of looking into that as well. The basic trick 
pg_bulkload is using is to populate the index as the data is being 
loaded. There's no fundamental reason why we couldn't do that 
internally in COPY. Triggers or constraints that access the table 
being loaded would make it impossible, but we should be able to detect 
that and fall back to what we have now.


What I'm basically thinking about is to modify the indexam API of 
building a new index, so that COPY would feed the tuples to the 
indexam, instead of the indexam opening and scanning the heap. The 
b-tree indexam would spool the tuples into a tuplesort as the COPY 
progresses, and build the index from that at the end as usual.




---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] Benchmark Data requested

2008-02-05 Thread Jignesh K. Shah
Commercial Db bulk loaders work the same way.. they give you an option 
as a fast loader provided in case of error, the whole table is 
truncated.  This I think also has real life advantages where PostgreSQL 
is used as datamarts which are recreated every now and then from other 
systems and they want fast loaders. So its not just the benchmarking 
folks like me that will take advantage of such features. INFACT I have 
seen that they force the clause REPLACE TABLE in the sense that will 
infact truncate the table before loading so there is no confusion what 
happens to the original data in the table and only then it avoids the logs.



to be honest, its not the WAL Writes to the disk that I am worried 
about.. According to my tests, async_commit is coming pretty close to 
sync=off and solves the WALWriteLock contention. We should maybe just 
focus on making it more efficient which I think also involves 
WALInsertLock that may not be entirely efficient.



Also all changes have to be addon options and not replacement for 
existing loads, I totally agree to that point.. The guys in production 
support don't even like optimizer query plan changes, forget  corrupt 
index. (I have spent two days in previous role trying to figure out why 
a particular query plan on another database changed in production.)







Simon Riggs wrote:

On Tue, 2008-02-05 at 13:47 -0500, Jignesh K. Shah wrote:
  

That sounds cool to me too..

How much work is to make pg_bulkload to work on 8.3? An Integrated 
version is certainly more beneficial.



  
Specially I think it will also help for other setups like TPC-E too 
where this is a problem.

   
If you don't write WAL then you can lose all your writes in a crash.

That issue is surmountable on a table with no indexes, or even
conceivably with one monotonically ascending index. With other indexes
if we crash then we have a likely corrupt index.

For most production systems I'm aware of, losing an index on a huge
table is not anything you'd want to trade for performance. Assuming
you've ever been knee-deep in it on a real server.

Maybe we can have a load mode for a table where we skip writing any
WAL, but if we crash we just truncate the whole table to nothing? Issue
a WARNING if we enable this mode while any data in table. I'm nervous of
it, but maybe people really want it?

I don't really want to invent ext2 all over again, so we have to run an
fsck on a table of we crash while loading. My concern is that many
people would choose that then blame us for delivering unreliable
software. e.g. direct path loader on Oracle used to corrupt a PK index
if you loaded duplicate rows with it (whether it still does I couldn't
care). That kind of behaviour is simply incompatible with production
usage, even if it does good benchmark.

  


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] Benchmark Data requested

2008-02-04 Thread Jignesh K. Shah

Doing it at low scales is not attractive.

Commercial databases are publishing at scale factor of 1000(about 1TB) 
to 1(10TB) with one in 30TB space. So ideally right now tuning 
should start at 1000 scale factor.


Unfortunately I have tried that before with PostgreSQL the few of the 
problems are as follows:


Single stream loader of PostgreSQL takes hours to load data. (Single 
stream load... wasting all the extra cores out there)


Multiple table loads ( 1 per table) spawned via script  is bit better 
but hits wal problems.


To avoid wal problems, I  had created tables and load statements within 
the same transaction, faster but cannot create index before load or it 
starts writing to wal... AND if indexes are created after load, it takes 
about a day or so to create all the indices required. (Its single 
threaded and creating multiple indexes/indices at the same time could 
result in running out of temporary DISK space since the tables are so 
big. Which means 1 thread at a time is the answer for creating tables 
that are really big. It is slow.


Boy, by this time most people doing TPC-H in high end give up on 
PostgreSQL.


I have not even started Partitioning of tables yet since with the 
current framework, you have to load the tables separately into each 
tables which means for the TPC-H data you need extra-logic to take 
that table data and split it into each partition child table. Not stuff 
that many people want to do by hand.


Then for the power run that is essentially running one query at a time 
should essentially be able to utilize the full system (specially 
multi-core systems), unfortunately PostgreSQL can use only one core. 
(Plus since this is read only and there is no separate disk reader all 
other processes are idle) and system is running at 1/Nth capacity (where 
N is the number of cores/threads)


(I am not sure here with Partitioned tables, do you get N processes 
running in the system when you scan the partitioned table?)


Even off-loading work like fetching the data into bufferpool into 
separate processes will go big time with this type of workloads.


I would be happy to help out if folks here want to do work related to 
it. Infact if you have time, I can request a project in one of the Sun 
Benchmarking center to see what we can learn with community members 
interested in understanding where PostgreSQL performs and fails.


Regards,
Jignesh

Greg Smith wrote:

On Mon, 4 Feb 2008, Simon Riggs wrote:


Would anybody like to repeat these tests with the latest production
versions of these databases (i.e. with PGSQL 8.3)


Do you have any suggestions on how people should run TPC-H?  It looked 
like a bit of work to sort through how to even start this exercise.


--
* Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [PERFORM] Benchmark Data requested

2008-02-04 Thread Jignesh K. Shah

Hi Simon,

I have some insight into TPC-H on how it works.

First of all I think it is a violation of TPC rules to publish numbers 
without auditing them first.  So even if I do the test to show the 
better performance of PostgreSQL 8.3, I cannot post it here or any 
public forum without doing going through the process. (Even though it 
is partial benchmark as they are just doing the equivalent of the 
PowerRun of TPCH) Maybe the PR of PostgreSQL team should email 
[EMAIL PROTECTED] about them and see what they have to say about that comparison.


On the technical side:

Remember all TPC-H queries when run sequentially on PostgreSQL uses only 
1 core or virtual CPU so it is a very bad for system to use it with 
PostgreSQL (same for MySQL too).


Also very important unless you are running the UPDATE FUNCTIONS which 
are separate queries, all these Q1-Q22 Queries are pure READ-ONLY 
queries. Traditionally I think PostgreSQL does lack READ-SPEEDs 
specially since it is bottlenecked by the size of the reads it does 
(BLOCKSIZE). Major database provides multi-block parameters to do 
multiple of reads/writes in terms of blocksizes to reduce IOPS and also 
for read only they also have READ-AHEAD or prefetch sizes which is 
generally bigger than multi-block or extent sizes to aid reads.


Scale factor is in terms of gigs and hence using max scale of 5 (5G) is 
pretty useless since most of the rows could be cached in modern day 
systems. And comparing with 0.01 is what 10MB? Size of recent L2 cache 
of Intel is probably bigger than that size.


If you are doing tuning for TPC-H Queries  focus on few of them:
For example Query 1 is very Join intensive  and if your CPU is not 100% 
used then you have a problem in your IO to solve before tuning it.


Another example is Query 16 is literally IO scan speed, many people use 
it to see if the database can scan at line speeds of the storage, 
ending up with 100% CPU means the database cannot process that many rows 
(just to bring it in).


In essence  each query does some combination of system features to 
highlight the performance. However since it is an old benchmark, 
database companies end up re-engineering their technologies to gain 
advantage in this benchmark (Hence its time for a successor in work 
called TPC-DS which will have more than 100 such queries)


Few of the technologies that have really helped gain ground in TPC-H world
* Hash and/or Range Partitioning of tables  ( PostgreSQL 8.3 can do that 
but the setup cost of writing schema is great specially since data has 
to be loaded in separate tables)
* Automated Aggregated Views  - used by optmiziers - database technology 
to update more frequently used aggregations in a smaller views
* Cube views Index - like bitmap but multidimensional (I think ..but not 
sure)


That said, is it useful to be used in Regression testing in PostgreSQL 
farms.  I would think yes.. specially Q16


Hope this helps.
Regards,
Jignesh




Simon Riggs wrote:

Can I ask for some help with benchmarking?

There are some results here that show PostgreSQL is slower in some cases
than Monet and MySQL. Of course these results were published immediately
prior to 8.2 being released, plus run out-of-the-box, so without even
basic performance tuning.

Would anybody like to repeat these tests with the latest production
versions of these databases (i.e. with PGSQL 8.3), and with some
sensible tuning settings for the hardware used? It will be useful to get
some blind tests with more sensible settings.

http://monetdb.cwi.nl/projects/monetdb//SQL/Benchmark/TPCH/

Multiple runs from different people/different hardware is useful since
they help to iron-out differences in hardware and test methodology. So
don't worry if you see somebody else doing this also.

Thanks,

  


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [PERFORM] Benchmark Data requested

2008-02-04 Thread Jignesh K. Shah



Gregory Stark wrote:

Incidentally we found some cases that Solaris was particularly bad at. Is
there anybody in particular that would be interested in hearing about them?
(Not meant to be a knock on Solaris, I'm sure there are other cases Linux or
BSD handle poorly too)


  


Send me the details, I can file bugs for Solaris on behalf of the 
community. Since I am involved in lot of PostgreSQL testing on Solaris 
this year, I have a small list myself (mostly related to file system 
stuff though).


I know one regarding bonnie rewriting blocks that you sent out. (I still 
havent done anything about it yet but finally have some test machines 
for such work instead of using my workstation to test it out :-)


But I am really interested in seeing which one hits PostgreSQL 
performance/usability.


Thanks in advance.

Regards,
Jignesh


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] Benchmark Data requested

2008-02-04 Thread Jignesh K. Shah


TPC-H has two runs
PowerRun which is single stream (Q1-22 RF1, RF2)
And Throughput Runs which has N (depends on scale) running 
simultaneously in a mixed sequence of the same queries and the two 
update functions. During throughput run you can expect to max out CPU... 
But commerial databases generally have PowerRuns running quite well even 
on multi-cores ( Oracle (without RAC have published with 144 cores on 
Solaris)


As for IO system saturating the CPU its two folds
Kernel fetching in the data which saturates at some value
and in this case PostgreSQL reading the data and putting it in its 
bufferpool


An example of how I use it is as follows:
Do a select query on a table such that it results in table scan without 
actually returning any rows back
Now keep throwing hardware (better storage) till it saturates the CPU. 
That's the practical max you can do with the CPU/OS combination 
(considering unlimited storage bandwidth). This one is primarily used in 
guessing how fast one of the queries in TPC-H will complete.


In my tests with PostgreSQL, I generally reach the CPU limit without 
even reaching the storage bandwidth of the underlying storage.

Just to give numbers
Single 2Gb Fiber Channel port can practically go upto 180 MB/sec
Single 4Gb ports have proven to go upto 360-370MB/sec
So to saturate a FC port, postgreSQL has to be able to scan 370MB/sec 
without saturating the CPU.
Then comes software stripping which allows multiple ports to be stripped 
over increasing the capacity of the bandwidth... Now scanning has to be 
able to drive Nx370MB/sec (all on single core).


I had some numbers and I had some limitations based on cpu frequency, 
blocksize ,etc but those were for 8.1 days or so..


I think to take PostgreSQL a bit high end, we have to first scale out 
these numbers.
Doing some sorts of test in PostgreSQL farms for every release actually 
does help people see the amount of data that it can drive through...


We can actually work on some database operation metrics to also guage 
how much each release is improving over older releases.. I have ideas 
for few of them.


Regards,
Jignesh


Gregory Stark wrote:

Jignesh K. Shah [EMAIL PROTECTED] writes:

  

Then for the power run that is essentially running one query at a time should
essentially be able to utilize the full system (specially multi-core systems),
unfortunately PostgreSQL can use only one core. (Plus since this is read only
and there is no separate disk reader all other processes are idle) and system
is running at 1/Nth capacity (where N is the number of cores/threads)



Is the whole benchmark like this or is this just one part of it?

Is the i/o system really able to saturate the cpu though?

  


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] [HACKERS] 8.3beta1 testing on Solaris

2007-10-26 Thread Jignesh K. Shah
I agree with Tom..  somehow I think  increasing NUM_CLOG_BUFFERS is just 
avoiding the symptom to a later value.. I promise to look more into it 
before making any recommendations to increase NUM_CLOG_BUFFERs.



Because though iGen  showed improvements in that area by increasing 
num_clog_buffers , EAStress had shown no improvements.. Plus the reason 
I think this is not the problem in 8.3beta1 since the Lock Output 
clearly does not show CLOGControlFile as to be the issue which I had 
seen in earlier case.  So I dont think that increasing NUM_CLOG_BUFFERS 
will change thing here.


Now I dont understand the code pretty well yet I see three hotspots and 
not sure if they are related to each other
* ProcArrayLock waits  - causing Waits  as reported by 
83_lockwait.d script
* SimpleLRUReadPage - causing read IOs as reported by 
iostat/rsnoop.d

* GetSnapshotData - causing CPU utilization  as reported by hotuser

But I will shut up and do more testing.

Regards,
Jignesh



Tom Lane wrote:

Josh Berkus [EMAIL PROTECTED] writes:
  
Actually, 32 made a significant difference as I recall ... do you still have 
the figures for that, Jignesh?



I'd want to see a new set of test runs backing up any call for a change
in NUM_CLOG_BUFFERS --- we've changed enough stuff around this area that
benchmarks using code from a few months back shouldn't carry a lot of
weight.

regards, tom lane
  


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] [HACKERS] 8.3beta1 testing on Solaris

2007-10-26 Thread Jignesh K. Shah
The problem I saw was first highlighted by EAStress runs with PostgreSQL 
on Solaris with 120-150 users. I just replicated that via my smaller 
internal benchmark that we use here to recreate that problem.


EAStress should be just fine to highlight it.. Just put pg_clog on 
O_DIRECT or something so that all IOs go to disk making it easier to 
observe.


In the meanwhile I will try to get more information.


Regards,
Jignesh


Tom Lane wrote:

Gregory Stark [EMAIL PROTECTED] writes:
  

Didn't we already go through this? He and Simon were pushing to bump up
NUM_CLOG_BUFFERS and you were arguing that the test wasn't representative and
some other clog.c would have to be reengineered to scale well to larger
values. 



AFAIR we never did get any clear explanation of what the test case is.
I guess it must be write-mostly, else lazy XID assignment would have
helped this by reducing the rate of XID consumption.

It's still true that I'm leery of a large increase in the number of
buffers without reengineering slru.c.  That code was written on the
assumption that there were few enough buffers that a linear search
would be fine.  I'd hold still for 16, or maybe even 32, but I dunno
how much impact that will have for such a test case.

regards, tom lane

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match
  


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] [HACKERS] 8.3beta1 testing on Solaris

2007-10-26 Thread Jignesh K. Shah

Tom,

Here is what I did:

I started aggregating all read information:

First I also had added group by pid(arg0,arg1, pid) and the counts 
were all coming as 1


Then I just grouped by filename and location (arg0,arg1 of reads) and 
the counts came back as


# cat read.d
#!/usr/sbin/dtrace -s
syscall::read:entry
/execname==postgres/
{
   @read[fds[arg0].fi_pathname, arg1] = count();
}


# ./read.d
dtrace: script './read.d' matched 1 probe
^C

 /export/home0/igen/pgdata/pg_clog/0014   
-27530282934721
 /export/home0/igen/pgdata/pg_clog/0014   
-27530282770881
 /export/home0/igen/pgdata/pg_clog/0015   
-27530282443202
 /export/home0/igen/pgdata/pg_clog/0015   
-2753028268896   14
 /export/home0/igen/pgdata/pg_clog/0015   
-2753028260704   25
 /export/home0/igen/pgdata/pg_clog/0015   
-2753028252512   27
 /export/home0/igen/pgdata/pg_clog/0015   
-2753028277088   28
 /export/home0/igen/pgdata/pg_clog/0015   
-2753028293472   37



FYI  I pressed ctrl-c within like less than a second

So to me this seems that multiple processes are reading the same page 
from different pids. (This was with about 600 suers active.


Aparently we do have a problem that we are reading the same buffer 
address again.  (Same as not being cached anywhere or not finding it in 
cache anywhere).


I reran lock wait script on couple of processes and did not see 
CLogControlFileLock  as a problem..


# ./83_lwlock_wait.d 14341

Lock IdMode   Count
  WALInsertLock   Exclusive   1
  ProcArrayLock   Exclusive  16

Lock Id   Combined Time (ns)
  WALInsertLock   383109
  ProcArrayLock198866236

# ./83_lwlock_wait.d 14607

Lock IdMode   Count
  WALInsertLock   Exclusive   2
  ProcArrayLock   Exclusive  15

Lock Id   Combined Time (ns)
  WALInsertLock55243
  ProcArrayLock 69700140

#

What will help you find out why it is reading the same page again?


-Jignesh



Jignesh K. Shah wrote:
I agree with Tom..  somehow I think  increasing NUM_CLOG_BUFFERS is 
just avoiding the symptom to a later value.. I promise to look more 
into it before making any recommendations to increase NUM_CLOG_BUFFERs.



Because though iGen  showed improvements in that area by increasing 
num_clog_buffers , EAStress had shown no improvements.. Plus the 
reason I think this is not the problem in 8.3beta1 since the Lock 
Output clearly does not show CLOGControlFile as to be the issue which 
I had seen in earlier case.  So I dont think that increasing 
NUM_CLOG_BUFFERS will change thing here.


Now I dont understand the code pretty well yet I see three hotspots 
and not sure if they are related to each other
* ProcArrayLock waits  - causing Waits  as reported by 
83_lockwait.d script
* SimpleLRUReadPage - causing read IOs as reported by 
iostat/rsnoop.d

* GetSnapshotData - causing CPU utilization  as reported by hotuser

But I will shut up and do more testing.

Regards,
Jignesh



Tom Lane wrote:

Josh Berkus [EMAIL PROTECTED] writes:
 
Actually, 32 made a significant difference as I recall ... do you 
still have the figures for that, Jignesh?



I'd want to see a new set of test runs backing up any call for a change
in NUM_CLOG_BUFFERS --- we've changed enough stuff around this area that
benchmarks using code from a few months back shouldn't carry a lot of
weight.

regards, tom lane
  


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [PERFORM] [HACKERS] 8.3beta1 testing on Solaris

2007-10-26 Thread Jignesh K. Shah

Also to give perspective on the equivalent writes on CLOG

I used the following script which runs for 10 sec to track all writes to 
the clog directory and here is what it came up with... (This is with 500 
users running)


# cat write.d
#!/usr/sbin/dtrace -s
syscall::write:entry
/execname==postgres  
dirname(fds[arg0].fi_pathname)==/export/home0/igen/pgdata/pg_clog/

{
   @write[fds[arg0].fi_pathname,arg1] = count();
}
tick-10sec
{
exit(0);
}

# ./write.d
dtrace: script './write.d' matched 2 probes
CPU IDFUNCTION:NAME
 3   1026  :tick-10sec

 /export/home0/igen/pgdata/pg_clog/001E   
-27530282770881

#
I modified read.d to do a 5sec read
# ./read.d
dtrace: script './read.d' matched 3 probes
CPU IDFUNCTION:NAME
 0  1   :BEGIN
 0   1027   :tick-5sec

 /export/home0/igen/pgdata/pg_clog/001F   
-27530282688961
 /export/home0/igen/pgdata/pg_clog/001F   
-27530282525121
 /export/home0/igen/pgdata/pg_clog/001F   
-27530282852802
 /export/home0/igen/pgdata/pg_clog/001F   
-27530282770883
 /export/home0/igen/pgdata/pg_clog/001F   
-27530282361283
 /export/home0/igen/pgdata/pg_clog/001E   
-27530282852805
 /export/home0/igen/pgdata/pg_clog/001E   
-27530282361289
 /export/home0/igen/pgdata/pg_clog/001E   
-2753028277088   13
 /export/home0/igen/pgdata/pg_clog/001E   
-2753028268896   15
 /export/home0/igen/pgdata/pg_clog/001E   
-2753028252512   27

#

So the ratio of reads vs writes to clog files is pretty huge..


-Jignesh



Jignesh K. Shah wrote:

Tom,

Here is what I did:

I started aggregating all read information:

First I also had added group by pid(arg0,arg1, pid) and the counts 
were all coming as 1


Then I just grouped by filename and location (arg0,arg1 of reads) and 
the counts came back as


# cat read.d
#!/usr/sbin/dtrace -s
syscall::read:entry
/execname==postgres/
{
   @read[fds[arg0].fi_pathname, arg1] = count();
}


# ./read.d
dtrace: script './read.d' matched 1 probe
^C

 /export/home0/igen/pgdata/pg_clog/0014   
-27530282934721
 /export/home0/igen/pgdata/pg_clog/0014   
-27530282770881
 /export/home0/igen/pgdata/pg_clog/0015   
-27530282443202
 /export/home0/igen/pgdata/pg_clog/0015   
-2753028268896   14
 /export/home0/igen/pgdata/pg_clog/0015   
-2753028260704   25
 /export/home0/igen/pgdata/pg_clog/0015   
-2753028252512   27
 /export/home0/igen/pgdata/pg_clog/0015   
-2753028277088   28
 /export/home0/igen/pgdata/pg_clog/0015   
-2753028293472   37



FYI  I pressed ctrl-c within like less than a second

So to me this seems that multiple processes are reading the same page 
from different pids. (This was with about 600 suers active.


Aparently we do have a problem that we are reading the same buffer 
address again.  (Same as not being cached anywhere or not finding it 
in cache anywhere).


I reran lock wait script on couple of processes and did not see 
CLogControlFileLock  as a problem..


# ./83_lwlock_wait.d 14341

Lock IdMode   Count
  WALInsertLock   Exclusive   1
  ProcArrayLock   Exclusive  16

Lock Id   Combined Time (ns)
  WALInsertLock   383109
  ProcArrayLock198866236

# ./83_lwlock_wait.d 14607

Lock IdMode   Count
  WALInsertLock   Exclusive   2
  ProcArrayLock   Exclusive  15

Lock Id   Combined Time (ns)
  WALInsertLock55243
  ProcArrayLock 69700140

#

What will help you find out why it is reading the same page again?


-Jignesh



Jignesh K. Shah wrote:
I agree with Tom..  somehow I think  increasing NUM_CLOG_BUFFERS is 
just avoiding the symptom to a later value.. I promise to look more 
into it before making any recommendations to increase NUM_CLOG_BUFFERs.



Because though iGen  showed improvements in that area by increasing 
num_clog_buffers , EAStress had shown no improvements.. Plus the 
reason I think this is not the problem in 8.3beta1 since the Lock 
Output clearly does not show CLOGControlFile as to be the issue which 
I had seen in earlier case.  So I dont think that increasing 
NUM_CLOG_BUFFERS will change thing here.


Now I dont understand the code pretty well yet I see three hotspots 
and not sure if they are related to each other
* ProcArrayLock waits  - causing Waits  as reported by 
83_lockwait.d script

Re: [PERFORM] 8.3beta1 testing on Solaris

2007-10-26 Thread Jignesh K. Shah


Hi George,

I have seen the 4M/sec problem first actually during an EAStress type 
run with only 150 connections.


I will try to do more testing today that Tom has requested.

Regards,
Jignesh


Gregory Stark wrote:

Jignesh K. Shah [EMAIL PROTECTED] writes:

  

CLOG data is not cached in any PostgreSQL shared memory segments and hence
becomes the bottleneck as it has to constantly go to the filesystem to get
the read data.



This is the same bottleneck you discussed earlier. CLOG reads are cached in
the Postgres shared memory segment but only NUM_CLOG_BUFFERS are which
defaults to 8 buffers of 8kb each. With 1,000 clients and the transaction rate
you're running you needed a larger number of buffers.

Using the filesystem buffer cache is also an entirely reasonable solution
though. That's surely part of the logic behind not trying to keep more of the
clog in shared memory. Do you have any measurements of how much time is being
spent just doing the logical I/O to the buffer cache for the clog pages? 4MB/s
seems like it's not insignificant but your machine is big enough that perhaps
I'm thinking at the wrong scale.

I'm really curious whether you see any benefit from the vxid read-only
transactions. I'm not sure how to get an apples to apples comparison though.
Ideally just comparing it to CVS HEAD from immediately prior to the vxid patch
going in. Perhaps calling some function which forces an xid to be allocated
and seeing how much it slows down the benchmark would be a good substitute.

  


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


[PERFORM] 8.3beta1 testing on Solaris

2007-10-25 Thread Jignesh K. Shah

Update on my testing 8.3beta1 on Solaris.

* CLOG reads
* Asynchronous Commit benefit
* Hot CPU Utilization

Regards,
Jignesh

__Background_:_
We were using PostgreSQL 8.3beta1 testing on our latest Sun SPARC 
Enterprise T5220 Server using Solaris 10 8/07 and Sun Fire X4200 using 
Solaris 10 8/07. Generally for performance benefits in Solaris  we put 
file systems on forcedirectio we bypass the filesystem cache and go 
direct to disks.


__Problem_:_
What we were observing that there were lots of reads happening  about 
4MB/sec on the file system holding $PGDATA and the database tables 
during an OLTP Benchmark run. Initially we thought that our bufferpools 
were not big enough. But thanks to 64-bit builds  we could use bigger 
bufferpools. However even with extraordinary bufferpool sizes we still 
saw lots of reads going to the disks.


__DTrace to the Rescue_:_

I modified iosnoop.d to just snoop on reads. The modified rsnoop.d is as 
follows:

$ cat rsnoop.d
#!/usr/sbin/dtrace -s
syscall::read:entry
/execname==postgres/
{
  printf(pid %d reading  %s\n, pid, fds[arg0].fi_pathname);
}

Based on it I found that most postgresql  processes were doing lots of 
reads from pg_clog directory.
CLOG or commit logs keep track of transactions in flight. Writes of CLOG 
comes from recording of transaction commits( or when it aborts) or when 
an XLOG is generated. However though I am not clear on reads yet, it 
seems every process constantly reads it to get some status. CLOG data is 
not cached in any PostgreSQL shared memory segments and hence becomes 
the bottleneck as it has to constantly go to the filesystem to get the 
read data.

# ./rsnoop.d
dtrace: script './rsnoop.d' matched 1 probe
CPU IDFUNCTION:NAME
 0  49222   read:entry pid 8739 reading  
/export/home0/igen/pgdata/pg_clog/000C


 0  49222   read:entry pid 9607 reading  
/export/home0/igen/pgdata/pg_clog/000C


 0  49222   read:entry pid 9423 reading  
/export/home0/igen/pgdata/pg_clog/000C


 0  49222   read:entry pid 8731 reading  
/export/home0/igen/pgdata/pg_clog/000C


 0  49222   read:entry pid 8719 reading  
/export/home0/igen/pgdata/pg_clog/000C


 0  49222   read:entry pid 9019 reading  
/export/home0/igen/pgdata/pg_clog/000C


 1  49222   read:entry pid 9255 reading  
/export/home0/igen/pgdata/pg_clog/000C


 1  49222   read:entry pid 8867 reading  
/export/home0/igen/pgdata/pg_clog/000C



Later on  during another run I added  ustack() after the printf in the 
above script to get the function name also:


# ./rsnoop.d
dtrace: script './rsnoop.d' matched 1 probe
CPU IDFUNCTION:NAME
 0  49222   read:entry pid 10956 reading  
/export/home0/igen/pgdata/pg_clog/0011

 libc.so.1`_read+0xa
 postgres`SimpleLruReadPage+0x3e6
 postgres`SimpleLruReadPage_ReadOnly+0x9b
 postgres`TransactionIdGetStatus+0x1f
 postgres`TransactionIdDidCommit+0x42
 postgres`HeapTupleSatisfiesVacuum+0x21a
 postgres`heap_prune_chain+0x14b
 postgres`heap_page_prune_opt+0x1e6
 postgres`index_getnext+0x144
 postgres`IndexNext+0xe1
 postgres`ExecScan+0x189
 postgres`ExecIndexScan+0x43
 postgres`ExecProcNode+0x183
 postgres`ExecutePlan+0x9e
 postgres`ExecutorRun+0xab
 postgres`PortalRunSelect+0x47a
 postgres`PortalRun+0x262
 postgres`exec_execute_message+0x565
 postgres`PostgresMain+0xf45
 postgres`BackendRun+0x3f9

 0  49222   read:entry pid 10414 reading  
/export/home0/igen/pgdata/pg_clog/0011

 libc.so.1`_read+0xa
 postgres`SimpleLruReadPage+0x3e6
 postgres`SimpleLruReadPage_ReadOnly+0x9b
 postgres`TransactionIdGetStatus+0x1f
 postgres`TransactionIdDidCommit+0x42
 postgres`HeapTupleSatisfiesVacuum+0x21a
 postgres`heap_prune_chain+0x14b
 postgres`heap_page_prune_opt+0x1e6
 postgres`index_getnext+0x144
 postgres`IndexNext+0xe1
 postgres`ExecScan+0x189
^C  libc.so.1`_read+0xa
 postgres`SimpleLruReadPage+0x3e6
 postgres`SimpleLruReadPage_ReadOnly+0x9b
 postgres`TransactionIdGetStatus+0x1f
 postgres`TransactionIdDidCommit+0x42
 postgres`HeapTupleSatisfiesMVCC+0x34f
 postgres`index_getnext+0x29e
 postgres`IndexNext+0xe1
 postgres`ExecScan+0x189
 postgres`ExecIndexScan+0x43
 postgres`ExecProcNode+0x183
 postgres`ExecutePlan+0x9e
 postgres`ExecutorRun+0xab
 postgres`PortalRunSelect+0x47a
 

[PERFORM] PostgreSQL 8.3beta1 on Solaris testing case study

2007-10-25 Thread Jignesh K. Shah
I thought I will update this to the Performance alias too about our 
testing with PG8.3beta1 on Solaris.


Regards,
Jignesh

__Background_:_
We were using PostgreSQL 8.3beta1 testing on our latest Sun SPARC 
Enterprise T5220 Server using Solaris 10 8/07. Generally for performance 
benefits in Solaris  we put file systems on forcedirectio we bypass the 
filesystem cache and go direct to disks.


__Problem_:_
What we were observing that there were lots of reads happening  about 
4MB/sec on the file system holding $PGDATA and the database tables 
during an OLTP Benchmark run. Initially we thought that our bufferpools 
were not big enough. But thanks to 64-bit builds  we could use bigger 
bufferpools. However even with extraordinary bufferpool sizes we still 
saw lots of reads going to the disks.


__DTrace to the Rescue_:_

I modified iosnoop.d to just snoop on reads. The modified rsnoop.d is as 
follows:

$ cat rsnoop.d
#!/usr/sbin/dtrace -s
syscall::read:entry
/execname==postgres/
{
   printf(pid %d reading  %s\n, pid, fds[arg0].fi_pathname);
}

Based on it I found that most postgresql  processes were doing lots of 
reads from pg_clog directory.
CLOG or commit logs keep track of transactions in flight. Writes of CLOG 
comes from recording of transaction commits( or when it aborts) or when 
an XLOG is generated. However though I am not clear on reads yet, it 
seems every process constantly reads it to get some status. CLOG data is 
not cached in any PostgreSQL shared memory segments and hence becomes 
the bottleneck as it has to constantly go to the filesystem to get the 
read data.



__Workaround for the high reads on CLOG on Solaris_ :
_Start with the cluster $PGDATA on regular UFS (which is buffered and 
logging is enabled). Always create a new tablespace for your database on 
forcedirectio mounted file system which bypasses the file system cache.  
This allows all PostgreSQL CLOG files to be cached in UFS greatly 
reducing stress on the underlying storage. For writes to the best of my 
knowledge, PostgreSQL will still do fsync to force the writes the CLOGs 
onto the disks so it is consistent. But the reads are spared from going 
to the disks and returned from the cache.


__Result_:_
With rightly sized bufferpool now all database data can be in PostgreSQL 
cache and hence reads are spared from the tablespaces. As for PGDATA 
data, UFS will do the caching of CLOG files, etc and hence sparring 
reads from going to the disks again. In the end what we achieve is a 
right sized bufferpool where there are no reads required during a high 
OLTP environment and the disks are just busy doing the writes of updates 
and inserts.




---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [PERFORM] CLOG Patch

2007-08-10 Thread Jignesh K. Shah
I tried with CLOG  24 also and I got linear performance upto 1250 users 
after which it started to tank. 32 got us to 1350 users before some 
other bottleneck overtook it.



Based on what Tom said earlier, it might then make sense to make it a 
tunable with the default of 8 but something one can change for high 
number of users.



Thanks.
Regards,
Jignesh


Simon Riggs wrote:

On Fri, 2007-08-03 at 16:09 -0400, Jignesh K. Shah wrote:

  
This patch seems to work well (both with 32 and 64 value but not with 16 
and the default 8). 



Could you test at 24 please also? Tom has pointed out the additional
cost of setting this higher, even in workloads that don't benefit from
the I/O-induced contention reduction.

  

Is there a way we can integrate this in 8.3?



I just replied to Josh's thread on -hackers about this.

  
This will improve out of box performance quite a bit for high number of 
users (atleat 30% in my OLTP test)



Yes, thats good. Will this have a dramatic effect on a particular
benchmark, or for what reason might we need this? Tom has questioned the
use case here, so I think it would be good to explain a little more for
everyone. Thanks.

  


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


[PERFORM] CLOG Patch

2007-08-03 Thread Jignesh K. Shah

Hi Simon,

This patch seems to work well (both with 32 and 64 value but not with 16 
and the default 8). Is there a way we can integrate this in 8.3?


This will improve out of box performance quite a bit for high number of 
users (atleat 30% in my OLTP test)


Regards,
Jignesh


Simon Riggs wrote:

On Thu, 2007-07-26 at 11:27 -0400, Jignesh K. Shah wrote:

  

However at 900 Users where the big drop in throughput occurs:
It gives a different top consumer of time:




postgres`LWLockAcquire+0x1c8
  

  postgres`SimpleLruReadPage+0x1ac
  postgres`TransactionIdGetStatus+0x14
  postgres`TransactionLogFetch+0x58



TransactionIdGetStatus doesn't directly call SimpleLruReadPage().
Presumably the compiler has been rearranging things??

Looks like you're out of clog buffers. It seems like the clog buffers
aren't big enough to hold clog pages for long enough and the SELECT FOR
SHARE processing is leaving lots of additional read locks that are
increasing the number of clog requests for older xids.

Try the enclosed patch.
 
  



Index: src/include/access/clog.h
===
RCS file: /projects/cvsroot/pgsql/src/include/access/clog.h,v
retrieving revision 1.19
diff -c -r1.19 clog.h
*** src/include/access/clog.h   5 Jan 2007 22:19:50 -   1.19
--- src/include/access/clog.h   26 Jul 2007 15:44:58 -
***
*** 29,35 
  
  
  /* Number of SLRU buffers to use for clog */

! #define NUM_CLOG_BUFFERS  8
  
  
  extern void TransactionIdSetStatus(TransactionId xid, XidStatus status);

--- 29,35 
  
  
  /* Number of SLRU buffers to use for clog */

! #define NUM_CLOG_BUFFERS  64  
  
  
  extern void TransactionIdSetStatus(TransactionId xid, XidStatus status);
  




---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings
  


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] User concurrency thresholding: where do I look?

2007-07-31 Thread Jignesh K. Shah
Yep quite a bit of transactions .. But the piece that's slow is where it 
is clearing it up in CommitTransaction().
I am not sure of how ProcArrayLock is designed to work and hence not 
clear what  we are seeing is what we expect.


Do we have some design doc on ProcArrayLock to understand its purpose?

Thanks.
Regards,
Jignesh


Josh Berkus wrote:

Simon,

  

Well thats pretty weird. That code path clearly only happens once per
transaction and ought to be fast. The other code paths that take
ProcArrayLock like TransactionIdIsInProgress() and GetSnapshotData()
ought to spend more time holding the lock. Presumably you are running
with a fair number of SERIALIZABLE transactions?



Given that this is TPCC-analog, I'd assume that we are.

Jignesh?

  


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] User concurrency thresholding: where do I look?

2007-07-30 Thread Jignesh K. Shah
With CLOG 16 the drp[s comes at about 1150 users with the following lock 
stats

bash-3.00# ./4_lwlock_waits.d 16404

Lock IdMode   Count
  ProcArrayLock  Shared   2
 XidGenLock   Exclusive   2
 XidGenLock  Shared   4
  WALInsertLock   Exclusive   7
CLogControlLock  Shared   8
   WALWriteLock   Exclusive  46
  ProcArrayLock   Exclusive  64
CLogControlLock   Exclusive 263

Lock IdMode   Combined Time (ns)
 XidGenLock   Exclusive   528300
  ProcArrayLock  Shared   968800
  WALInsertLock   Exclusive  4090900
 XidGenLock  Shared 73987600
   WALWriteLock   Exclusive 86200700
  ProcArrayLock   Exclusive130756000
CLogControlLock  Shared240471000
CLogControlLock   Exclusive   4115158500

So I think 32  is a better option for CLogs before ProcArrayLock becomes 
the bottleneck.


Though I havent seen what we can do with ProcArrayLock problem.


Regards,
Jignesh



Jignesh K. Shah wrote:
Using CLOG Buffers 32 and the commit sibling check patch I still see a 
drop at 1200-1300 users..




bash-3.00# ./4_lwlock_waits.d 18250

Lock IdMode   Count
 XidGenLock  Shared   1
CLogControlLock  Shared   2
  ProcArrayLock  Shared   2
 XidGenLock   Exclusive   4
CLogControlLock   Exclusive  15
  WALInsertLock   Exclusive  18
   WALWriteLock   Exclusive  38
  ProcArrayLock   Exclusive  77

Lock IdMode   Combined Time (ns)
 XidGenLock  Shared88700
  WALInsertLock   Exclusive 69556000
  ProcArrayLock  Shared 95656800
 XidGenLock   Exclusive139634100
CLogControlLock   Exclusive148822200
CLogControlLock  Shared16163
   WALWriteLock   Exclusive332781800
  ProcArrayLock   Exclusive   5688265500

bash-3.00# ./4_lwlock_waits.d 18599

Lock IdMode   Count
  ProcArrayLock  Shared   2
 XidGenLock   Exclusive   3
 XidGenLock  Shared   4
CLogControlLock  Shared   5
  WALInsertLock   Exclusive  10
CLogControlLock   Exclusive  21
   WALWriteLock   Exclusive  28
  ProcArrayLock   Exclusive  54

Lock IdMode   Combined Time (ns)
 XidGenLock   Exclusive  5688800
  WALInsertLock   Exclusive 11424700
CLogControlLock  Shared 55589100
  ProcArrayLock  Shared135220400
   WALWriteLock   Exclusive177906900
 XidGenLock  Shared524146500
CLogControlLock   Exclusive524563900
  ProcArrayLock   Exclusive   5828744500

bash-3.00#
bash-3.00# ./6_lwlock_stack.d 4 18599

Lock IdMode   Count
  ProcArrayLock  Shared   1
  ProcArrayLock   Exclusive  52

Lock IdMode   Combined Time (ns)
  ProcArrayLock  Shared 41428300
  ProcArrayLock   Exclusive   3858386500

Lock Id   Combined Time (ns)


 postgres`LWLockAcquire+0x1f0
 postgres`GetSnapshotData+0x120
 postgres`GetTransactionSnapshot+0x80
 postgres`PortalStart+0x198
 postgres`exec_bind_message+0x84c
 postgres`PostgresMain+0x17f8
 postgres`BackendRun+0x2f8
 postgres`ServerLoop+0x680
 postgres`PostmasterMain+0xda8
 postgres`main+0x3d0
 postgres`_start+0x17c
 Shared 41428300

 postgres`LWLockAcquire+0x1f0
 postgres`CommitTransaction+0x104
 postgres`CommitTransactionCommand+0xbc
 postgres`finish_xact_command+0x78
 postgres`exec_execute_message+0x42c
 postgres`PostgresMain+0x1838
 postgres`BackendRun+0x2f8
 postgres`ServerLoop+0x680
 postgres`PostmasterMain+0xda8
 postgres`main+0x3d0
 postgres`_start+0x17c
 Exclusive3858386500


-Jignesh


---(end of broadcast

Re: [PERFORM] User concurrency thresholding: where do I look?

2007-07-27 Thread Jignesh K. Shah

Yes I can try to breakup the Shared and exclusive time..

Also yes I use commit delays =10, it helps out a lot in reducing IO load..

I will try out the patch soon.

-Jignesh


Simon Riggs wrote:

On Thu, 2007-07-26 at 17:17 -0400, Jignesh K. Shah wrote:
  

 Lock Id   Combined Time (ns)
  XidGenLock194966200
   WALInsertLock517955000
 CLogControlLock679665100
WALWriteLock   2838716200
   ProcArrayLock  44181002600



Is this the time the lock is held for or the time that we wait for that
lock? It would be good to see the break down of time separately for
shared and exclusive.

Can we have a table like this:
LockId,LockMode,SumTimeLockHeld,SumTimeLockWait

  
Top Wait time   seems to come from the following code path for 
ProcArrayLock:


 Lock IdMode   Count
   ProcArrayLock   Exclusive  21

 Lock Id   Combined Time (ns)
   ProcArrayLock   5255937500

 Lock Id   Combined Time (ns)


  postgres`LWLockAcquire+0x1f0
  postgres`CommitTransaction+0x104
  postgres`CommitTransactionCommand+0xbc
  postgres`finish_xact_command+0x78



Well thats pretty weird. That code path clearly only happens once per
transaction and ought to be fast. The other code paths that take
ProcArrayLock like TransactionIdIsInProgress() and GetSnapshotData()
ought to spend more time holding the lock. Presumably you are running
with a fair number of SERIALIZABLE transactions? 


Are you running with commit_delay  0? Its possible that the call to
CountActiveBackends() is causing pinging of the procarray by other
backends while we're trying to read it during CommitTransaction(). If
so, try the attached patch.

  


---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate


Re: [PERFORM] User concurrency thresholding: where do I look?

2007-07-27 Thread Jignesh K. Shah
I tried CLOG Buffers 32 and the performance is as good as 64 bit.. (I 
havent tried 16 yet though.. ) I am going to try your second patch now..


Also here is the breakup by Mode. The combined time is the total time it 
waits for all counts.



Lock IdMode   Count
  ProcArrayLock  Shared   1
CLogControlLock   Exclusive   4
CLogControlLock  Shared   4
 XidGenLock  Shared   4
 XidGenLock   Exclusive   7
  WALInsertLock   Exclusive  21
   WALWriteLock   Exclusive  62
  ProcArrayLock   Exclusive  79

Lock IdModeCombined Time (ns)
CLogControlLockExclusive   325200
CLogControlLock   Shared  4509200
 XidGenLockExclusive 11839600
  ProcArrayLock   Shared 40506600
 XidGenLock   Shared119013700
  WALInsertLockExclusive148063100
   WALWriteLockExclusive347052100
  ProcArrayLockExclusive   1054780600

Here is another one at higher user count 1600:

bash-3.00# ./4_lwlock_waits.d 9208

Lock IdMode   Count
CLogControlLock   Exclusive   1
CLogControlLock  Shared   2
 XidGenLock  Shared   7
  WALInsertLock   Exclusive  12
   WALWriteLock   Exclusive  50
  ProcArrayLock   Exclusive  82

Lock IdMode   Combined Time (ns)
CLogControlLockExclusive27300
 XidGenLock   Shared 14689300
CLogControlLock   Shared 72664900
  WALInsertLockExclusive101431300
   WALWriteLockExclusive534357400
  ProcArrayLockExclusive   4110350300

Now I will try with your second patch.

Regards,
Jignesh

Simon Riggs wrote:

On Thu, 2007-07-26 at 17:17 -0400, Jignesh K. Shah wrote:
  

 Lock Id   Combined Time (ns)
  XidGenLock194966200
   WALInsertLock517955000
 CLogControlLock679665100
WALWriteLock   2838716200
   ProcArrayLock  44181002600



Is this the time the lock is held for or the time that we wait for that
lock? It would be good to see the break down of time separately for
shared and exclusive.

Can we have a table like this:
LockId,LockMode,SumTimeLockHeld,SumTimeLockWait

  
Top Wait time   seems to come from the following code path for 
ProcArrayLock:


 Lock IdMode   Count
   ProcArrayLock   Exclusive  21

 Lock Id   Combined Time (ns)
   ProcArrayLock   5255937500

 Lock Id   Combined Time (ns)


  postgres`LWLockAcquire+0x1f0
  postgres`CommitTransaction+0x104
  postgres`CommitTransactionCommand+0xbc
  postgres`finish_xact_command+0x78



Well thats pretty weird. That code path clearly only happens once per
transaction and ought to be fast. The other code paths that take
ProcArrayLock like TransactionIdIsInProgress() and GetSnapshotData()
ought to spend more time holding the lock. Presumably you are running
with a fair number of SERIALIZABLE transactions? 


Are you running with commit_delay  0? Its possible that the call to
CountActiveBackends() is causing pinging of the procarray by other
backends while we're trying to read it during CommitTransaction(). If
so, try the attached patch.

  



Index: src/backend/access/transam/xact.c
===
RCS file: /projects/cvsroot/pgsql/src/backend/access/transam/xact.c,v
retrieving revision 1.245
diff -c -r1.245 xact.c
*** src/backend/access/transam/xact.c   7 Jun 2007 21:45:58 -   1.245
--- src/backend/access/transam/xact.c   27 Jul 2007 09:09:08 -
***
*** 820,827 
 * are fewer than CommitSiblings other backends with 
active
 * transactions.
 */
!   if (CommitDelay  0  enableFsync 
!   CountActiveBackends() = CommitSiblings)
pg_usleep(CommitDelay);
  
  			XLogFlush(recptr);

--- 820,826 
 * are fewer than CommitSiblings other backends with 
active
 * transactions.
 */
!   if (CommitDelay  0  enableFsync)
pg_usleep(CommitDelay);
  
  			XLogFlush(recptr

Re: [PERFORM] User concurrency thresholding: where do I look?

2007-07-27 Thread Jignesh K. Shah
Using CLOG Buffers 32 and the commit sibling check patch I still see a 
drop at 1200-1300 users..




bash-3.00# ./4_lwlock_waits.d 18250

Lock IdMode   Count
 XidGenLock  Shared   1
CLogControlLock  Shared   2
  ProcArrayLock  Shared   2
 XidGenLock   Exclusive   4
CLogControlLock   Exclusive  15
  WALInsertLock   Exclusive  18
   WALWriteLock   Exclusive  38
  ProcArrayLock   Exclusive  77

Lock IdMode   Combined Time (ns)
 XidGenLock  Shared88700
  WALInsertLock   Exclusive 69556000
  ProcArrayLock  Shared 95656800
 XidGenLock   Exclusive139634100
CLogControlLock   Exclusive148822200
CLogControlLock  Shared16163
   WALWriteLock   Exclusive332781800
  ProcArrayLock   Exclusive   5688265500

bash-3.00# ./4_lwlock_waits.d 18599

Lock IdMode   Count
  ProcArrayLock  Shared   2
 XidGenLock   Exclusive   3
 XidGenLock  Shared   4
CLogControlLock  Shared   5
  WALInsertLock   Exclusive  10
CLogControlLock   Exclusive  21
   WALWriteLock   Exclusive  28
  ProcArrayLock   Exclusive  54

Lock IdMode   Combined Time (ns)
 XidGenLock   Exclusive  5688800
  WALInsertLock   Exclusive 11424700
CLogControlLock  Shared 55589100
  ProcArrayLock  Shared135220400
   WALWriteLock   Exclusive177906900
 XidGenLock  Shared524146500
CLogControlLock   Exclusive524563900
  ProcArrayLock   Exclusive   5828744500

bash-3.00#
bash-3.00# ./6_lwlock_stack.d 4 18599

Lock IdMode   Count
  ProcArrayLock  Shared   1
  ProcArrayLock   Exclusive  52

Lock IdMode   Combined Time (ns)
  ProcArrayLock  Shared 41428300
  ProcArrayLock   Exclusive   3858386500

Lock Id   Combined Time (ns)


 postgres`LWLockAcquire+0x1f0
 postgres`GetSnapshotData+0x120
 postgres`GetTransactionSnapshot+0x80
 postgres`PortalStart+0x198
 postgres`exec_bind_message+0x84c
 postgres`PostgresMain+0x17f8
 postgres`BackendRun+0x2f8
 postgres`ServerLoop+0x680
 postgres`PostmasterMain+0xda8
 postgres`main+0x3d0
 postgres`_start+0x17c
 Shared 41428300

 postgres`LWLockAcquire+0x1f0
 postgres`CommitTransaction+0x104
 postgres`CommitTransactionCommand+0xbc
 postgres`finish_xact_command+0x78
 postgres`exec_execute_message+0x42c
 postgres`PostgresMain+0x1838
 postgres`BackendRun+0x2f8
 postgres`ServerLoop+0x680
 postgres`PostmasterMain+0xda8
 postgres`main+0x3d0
 postgres`_start+0x17c
 Exclusive3858386500


-Jignesh


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] User concurrency thresholding: where do I look?

2007-07-26 Thread Jignesh K. Shah
  WALInsertLock  2177500
CLogControlLock  9064700
  ProcArrayLock199216000

bash-3.00# ./6_lwlock_stack.d 4 7056

Lock IdMode   Count
  ProcArrayLock  Shared   3
  ProcArrayLock   Exclusive  38

Lock Id   Combined Time (ns)
  ProcArrayLock858238600

Lock Id   Combined Time (ns)


 postgres`LWLockAcquire+0x1c8
 postgres`TransactionIdIsInProgress+0x50
 postgres`HeapTupleSatisfiesVacuum+0x2ec
 postgres`_bt_check_unique+0x2a0
 postgres`_bt_doinsert+0x98
 postgres`btinsert+0x54
 postgres`FunctionCall6+0x44
 postgres`index_insert+0x90
 postgres`ExecInsertIndexTuples+0x1bc
 postgres`ExecUpdate+0x500
 postgres`ExecutePlan+0x704
 postgres`ExecutorRun+0x60
 postgres`PortalRunMulti+0x2a0
 postgres`PortalRun+0x310
 postgres`exec_execute_message+0x3a0
 postgres`PostgresMain+0x1300
 postgres`BackendRun+0x278
 postgres`ServerLoop+0x63c
 postgres`PostmasterMain+0xc40
 postgres`main+0x394
  167100

 postgres`LWLockAcquire+0x1c8
 postgres`GetSnapshotData+0x118
 postgres`GetTransactionSnapshot+0x5c
 postgres`PortalRunMulti+0x22c
 postgres`PortalRun+0x310
 postgres`exec_execute_message+0x3a0
 postgres`PostgresMain+0x1300
 postgres`BackendRun+0x278
 postgres`ServerLoop+0x63c
 postgres`PostmasterMain+0xc40
 postgres`main+0x394
 postgres`_start+0x108
 7125900

 postgres`LWLockAcquire+0x1c8
 postgres`CommitTransaction+0xe0
 postgres`CommitTransactionCommand+0x90
 postgres`finish_xact_command+0x60
 postgres`exec_execute_message+0x3d8
 postgres`PostgresMain+0x1300
 postgres`BackendRun+0x278
 postgres`ServerLoop+0x63c
 postgres`PostmasterMain+0xc40
 postgres`main+0x394
 postgres`_start+0x108
   850945600

bash-3.00# echo 1100 users - DROP 
1100 users
bash-3.00# ./4_lwlock_waits.d 7056

Lock IdMode   Count
CLogControlLock  Shared   1
   WALWriteLock   Exclusive   1
 XidGenLock   Exclusive   1
  ProcArrayLock  Shared   2
  WALInsertLock   Exclusive   2
 XidGenLock  Shared   2
CLogControlLock   Exclusive   4
  ProcArrayLock   Exclusive  20

Lock Id   Combined Time (ns)
   WALWriteLock  4179500
 XidGenLock  6249400
CLogControlLock 20411100
  WALInsertLock 29707600
  ProcArrayLock207923700

bash-3.00# ./6_lwlock_stack.d 4 7056

Lock IdMode   Count
  ProcArrayLock   Exclusive  40

Lock Id   Combined Time (ns)
  ProcArrayLock692242100

Lock Id   Combined Time (ns)


 postgres`LWLockAcquire+0x1c8
 postgres`CommitTransaction+0xe0
 postgres`CommitTransactionCommand+0x90
 postgres`finish_xact_command+0x60
 postgres`exec_execute_message+0x3d8
 postgres`PostgresMain+0x1300
 postgres`BackendRun+0x278
 postgres`ServerLoop+0x63c
 postgres`PostmasterMain+0xc40
 postgres`main+0x394
 postgres`_start+0x108
   692242100

bash-3.00#




LockID for ProcArrayLock is 4 or the 5 entry in lwlock.h which seems to 
indicate it is lwlock.h

Any tweaks for that?

-Jignesh


Simon Riggs wrote:

On Thu, 2007-07-26 at 11:27 -0400, Jignesh K. Shah wrote:

  

However at 900 Users where the big drop in throughput occurs:
It gives a different top consumer of time:




postgres`LWLockAcquire+0x1c8
  

  postgres`SimpleLruReadPage+0x1ac
  postgres`TransactionIdGetStatus+0x14
  postgres`TransactionLogFetch+0x58



TransactionIdGetStatus doesn't directly call SimpleLruReadPage().
Presumably the compiler has been rearranging things??

Looks like you're out of clog buffers. It seems like the clog buffers
aren't big enough to hold clog pages for long enough and the SELECT FOR
SHARE processing is leaving lots of additional read locks that are
increasing the number of clog requests for older xids.

Try the enclosed patch.
 
  



Index: src/include/access/clog.h
===
RCS

Re: [PERFORM] User concurrency thresholding: where do I look?

2007-07-26 Thread Jignesh K. Shah

I will look for runs with longer samples..

Now the script could have mislabeled lock names.. Anyway digging into 
the one that seems to increase over time... I did stack profiles on how 
that increases... and here are some numbers..



For  600-850 Users: that potential mislabeled CheckPointStartLock  or 
LockID==12 comes from various sources where the top source (while system 
is still doing great) comes from:




 postgres`LWLockAcquire+0x1c8
 postgres`SimpleLruReadPage_ReadOnly+0xc
 postgres`TransactionIdGetStatus+0x14
 postgres`TransactionLogFetch+0x58
 postgres`TransactionIdDidCommit+0x4
 postgres`HeapTupleSatisfiesSnapshot+0x234
 postgres`heap_release_fetch+0x1a8
 postgres`index_getnext+0xf4
 postgres`IndexNext+0x7c
 postgres`ExecScan+0x8c
 postgres`ExecProcNode+0xb4
 postgres`ExecutePlan+0xdc
 postgres`ExecutorRun+0xb0
 postgres`PortalRunSelect+0x9c
 postgres`PortalRun+0x244
 postgres`exec_execute_message+0x3a0
 postgres`PostgresMain+0x1300
 postgres`BackendRun+0x278
 postgres`ServerLoop+0x63c
 postgres`PostmasterMain+0xc40
 8202100

 postgres`LWLockAcquire+0x1c8
 postgres`TransactionIdSetStatus+0x1c
 postgres`RecordTransactionCommit+0x2a8
 postgres`CommitTransaction+0xc8
 postgres`CommitTransactionCommand+0x90
 postgres`finish_xact_command+0x60
 postgres`exec_execute_message+0x3d8
 postgres`PostgresMain+0x1300
 postgres`BackendRun+0x278
 postgres`ServerLoop+0x63c
 postgres`PostmasterMain+0xc40
 postgres`main+0x394
 postgres`_start+0x108
30822100


However at 900 Users where the big drop in throughput occurs:
It gives a different top consumer of time:



 postgres`LWLockAcquire+0x1c8
 postgres`TransactionIdSetStatus+0x1c
 postgres`RecordTransactionCommit+0x2a8
 postgres`CommitTransaction+0xc8
 postgres`CommitTransactionCommand+0x90
 postgres`finish_xact_command+0x60
 postgres`exec_execute_message+0x3d8
 postgres`PostgresMain+0x1300
 postgres`BackendRun+0x278
 postgres`ServerLoop+0x63c
 postgres`PostmasterMain+0xc40
 postgres`main+0x394
 postgres`_start+0x108
   406601300

 postgres`LWLockAcquire+0x1c8
 postgres`SimpleLruReadPage+0x1ac
 postgres`TransactionIdGetStatus+0x14
 postgres`TransactionLogFetch+0x58
 postgres`TransactionIdDidCommit+0x4
 postgres`HeapTupleSatisfiesUpdate+0x360
 postgres`heap_lock_tuple+0x27c
 postgres`ExecutePlan+0x33c
 postgres`ExecutorRun+0xb0
 postgres`PortalRunSelect+0x9c
 postgres`PortalRun+0x244
 postgres`exec_execute_message+0x3a0
 postgres`PostgresMain+0x1300
 postgres`BackendRun+0x278
 postgres`ServerLoop+0x63c
 postgres`PostmasterMain+0xc40
 postgres`main+0x394
 postgres`_start+0x108
   444523100

 postgres`LWLockAcquire+0x1c8
 postgres`SimpleLruReadPage+0x1ac
 postgres`TransactionIdGetStatus+0x14
 postgres`TransactionLogFetch+0x58
 postgres`TransactionIdDidCommit+0x4
 postgres`HeapTupleSatisfiesSnapshot+0x234
 postgres`heap_release_fetch+0x1a8
 postgres`index_getnext+0xf4
 postgres`IndexNext+0x7c
 postgres`ExecScan+0x8c
 postgres`ExecProcNode+0xb4
 postgres`ExecutePlan+0xdc
 postgres`ExecutorRun+0xb0
 postgres`PortalRunSelect+0x9c
 postgres`PortalRun+0x244
 postgres`exec_execute_message+0x3a0
 postgres`PostgresMain+0x1300
 postgres`BackendRun+0x278
 postgres`ServerLoop+0x63c
 postgres`PostmasterMain+0xc40
  166130



Maybe you all will understand more than I do on what it does here.. 
Looks like IndexNext has a problem at high number of users to me.. but I 
could be wrong..


-Jignesh




Tom Lane wrote:

Jignesh K. Shah [EMAIL PROTECTED] writes:
  
The count is only for a 10-second snapshot.. Plus remember there are 
about 1000 users running so the connection  being profiled only gets 
0.01 of the period on CPU..  And the count is for that CONNECTION only.



OK, that explains the low absolute levels of the counts, but if the
counts are for a regular backend then there's still a lot of bogosity
here.  Backends won't be taking the CheckpointLock at all, nor do they
take CheckpointStartLock in exclusive mode.  The bgwriter would do

Re: [PERFORM] User concurrency thresholding: where do I look?

2007-07-26 Thread Jignesh K. Shah
   Exclusive 102

Lock Id   Combined Time (ns)
 OidGenLock21900
  WALInsertLock82500
 XidGenLock  3313400
CheckpointStartLock   1448289900

bash-3.00# echo 1050 users
1050 users
bash-3.00# ./4_lwlock_waits.d 20764

Lock IdMode   Count
  FreeSpaceLock   Exclusive   1
CheckpointStartLock  Shared   3
 XidGenLock   Exclusive   3
CheckpointStartLock   Exclusive 146

Lock Id   Combined Time (ns)
  FreeSpaceLock18400
 XidGenLock  1900900
CheckpointStartLock   2392893700

bash-3.00#






-Jignesh




Tom Lane wrote:

Jignesh K. Shah [EMAIL PROTECTED] writes:
  

Here is how I got the numbers..
I had about 1600 users login into postgresql. Then started the run with 
500 users and using DTrace I started tracking Postgresql  Locking as 
viewed from one user/connection. Echo statements indicate how many 
users were active at that point and how was throughput performing. All 
IO is done on /tmp which means on a RAM disk.



  

bash-3.00# echo 500 users - baseline number
500 users
bash-3.00# ./3_lwlock_acquires.d 19178


I don't think I believe these numbers.  For one thing, CheckpointLock
is simply not ever taken in shared mode.  The ratios of counts for
different locks seems pretty improbable too, eg there is no way on
earth that the LockMgrLocks are taken more often shared than
exclusive (I would expect no shared acquires at all in the sort of
test you are running).  Not to mention that the absolute number of
counts seems way too low.  So I think the counting tool is broken.

  
Combined time of what exactly?  It looks like this must be the total

duration the lock is held, at least assuming that the time for
CheckpointLock is correctly reported.  It'd be much more useful to see
the total time spent waiting to acquire the lock.

regards, tom lane

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq
  


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] User concurrency thresholding: where do I look?

2007-07-26 Thread Jignesh K. Shah

Will try 16 and 32  CLOGBUFFER tomorrow:

But here is locks data again with about increased time profiling (about 
2 minutes) for the connection with about 2000 users:


bash-3.00# time ./4_lwlock_waits.d 13583
^C

Lock IdMode   Count
  ProcArrayLock  Shared   5
 XidGenLock   Exclusive  13
CLogControlLock  Shared  14
 XidGenLock  Shared  15
CLogControlLock   Exclusive  21
  WALInsertLock   Exclusive  77
   WALWriteLock   Exclusive 175
  ProcArrayLock   Exclusive 275

Lock Id   Combined Time (ns)
 XidGenLock194966200
  WALInsertLock517955000
CLogControlLock679665100
   WALWriteLock   2838716200
  ProcArrayLock  44181002600


Top Wait time   seems to come from the following code path for 
ProcArrayLock:


Lock IdMode   Count
  ProcArrayLock   Exclusive  21

Lock Id   Combined Time (ns)
  ProcArrayLock   5255937500

Lock Id   Combined Time (ns)


 postgres`LWLockAcquire+0x1f0
 postgres`CommitTransaction+0x104
 postgres`CommitTransactionCommand+0xbc
 postgres`finish_xact_command+0x78
 postgres`exec_execute_message+0x42c
 postgres`PostgresMain+0x1838
 postgres`BackendRun+0x2f8
 postgres`ServerLoop+0x680
 postgres`PostmasterMain+0xda8
 postgres`main+0x3d0
 postgres`_start+0x17c
  5255937500



Regards,
Jignesh


Simon Riggs wrote:

On Thu, 2007-07-26 at 15:44 -0400, Jignesh K. Shah wrote:
  

BEAUTIFUL!!!

Using the Patch I can now go upto 1300 users without dropping.. But now 
it still repeats at 1300-1350 users..



OK, can you try again with 16 and 32 buffers please?  We need to know
how many is enough and whether this number needs to be variable via a
parameter, or just slightly larger by default. Thanks.

  
I corrected the Lock Descriptions based on what I got from lwlock.h and 
retried the whole scalability again with profiling again.. This time it 
looks like the ProcArrayLock



That's what I would expect with that many users.

  

 Lock IdMode   Count
  XidGenLock   Exclusive   1
 CLogControlLock  Shared   2
  XidGenLock  Shared   2
WALWriteLock   Exclusive   4
   WALInsertLock   Exclusive   8
 CLogControlLock   Exclusive   9
   ProcArrayLock   Exclusive   9



...but as Tom mentioned, we need to do longer runs now so these counts
get to somewhere in the hundreds so we have some statistical validity.

  


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] User concurrency thresholding: where do I look?

2007-07-25 Thread Jignesh K. Shah

Here is how I got the numbers..
I had about 1600 users login into postgresql. Then started the run with 
500 users and using DTrace I started tracking Postgresql  Locking as 
viewed from one user/connection. Echo statements indicate how many 
users were active at that point and how was throughput performing. All 
IO is done on /tmp which means on a RAM disk.


bash-3.00# echo 500 users - baseline number
500 users
bash-3.00# ./3_lwlock_acquires.d 19178

Lock IdMode   Count
   FirstLockMgrLock   Exclusive   1
   RelCacheInitLock   Exclusive   2
 SInvalLock   Exclusive   2
  WALInsertLock   Exclusive  10
 BufMappingLock   Exclusive  12
 CheckpointLock  Shared  29
CheckpointStartLock  Shared  29
 OidGenLock   Exclusive  29
 XidGenLock   Exclusive  29
   FirstLockMgrLock  Shared  33
CheckpointStartLock   Exclusive  78
  FreeSpaceLock   Exclusive 114
 OidGenLock  Shared 126
 XidGenLock  Shared 152
  ProcArrayLock  Shared 482

Lock Id   Combined Time (ns)
 SInvalLock29800
   RelCacheInitLock30300
 BufMappingLock   168800
   FirstLockMgrLock   414300
  FreeSpaceLock  1281700
  ProcArrayLock  7869900
  WALInsertLock 3200
CheckpointStartLock 13494700
 OidGenLock 25719100
 XidGenLock 26443300
 CheckpointLock194267800

bash-3.00# echo 600 users - Throughput rising
600 users
bash-3.00# ./3_lwlock_acquires.d 19178

Lock IdMode   Count
   RelCacheInitLock   Exclusive   1
 SInvalLock   Exclusive   1
 BufMappingLock   Exclusive   2
CLogControlLock   Exclusive   2
  WALInsertLock   Exclusive  11
   FirstLockMgrLock  Shared  20
 CheckpointLock  Shared  24
CheckpointStartLock  Shared  24
 OidGenLock   Exclusive  24
 XidGenLock   Exclusive  24
CheckpointStartLock   Exclusive  72
  FreeSpaceLock   Exclusive 102
 OidGenLock  Shared 106
 XidGenLock  Shared 128
  ProcArrayLock  Shared 394

Lock Id   Combined Time (ns)
 SInvalLock15600
   RelCacheInitLock15700
 BufMappingLock31000
CLogControlLock41500
   FirstLockMgrLock   289000
  FreeSpaceLock  3045400
CheckpointStartLock  7371800
  WALInsertLock  9383200
  ProcArrayLock 10457900
 OidGenLock 20005900
 XidGenLock 20331500
 CheckpointLock187067900

bash-3.00# echo 700 users  - Throughput rising
700 users
bash-3.00# ./3_lwlock_acquires.d 19178

Lock IdMode   Count
   RelCacheInitLock   Exclusive   1
 SInvalLock   Exclusive   1
 BufMappingLock   Exclusive   2
  WALInsertLock   Exclusive  17
 CheckpointLock  Shared  33
CheckpointStartLock  Shared  33
 OidGenLock   Exclusive  33
 XidGenLock   Exclusive  33
   FirstLockMgrLock  Shared  81
CheckpointStartLock   Exclusive  87
  FreeSpaceLock   Exclusive 124
 OidGenLock  Shared 125
 XidGenLock  Shared 150
  ProcArrayLock  Shared 500

Lock Id   Combined Time (ns)
   RelCacheInitLock15100
 SInvalLock15400
 BufMappingLock47400
   FirstLockMgrLock  3021000
  FreeSpaceLock  3794300
  WALInsertLock  7567300
 XidGenLock 18427400
  ProcArrayLock 20884000
CheckpointStartLock 24084900
 OidGenLock 26399500
 CheckpointLock256549800

bash-3.00# echo 800 users  - Throughput rising
800 users
bash-3.00# ./3_lwlock_acquires.d 19178

Lock IdMode   Count
 BufMappingLock   Exclusive   1
   RelCacheInitLock   Exclusive   1
 SInvalLock   Exclusive   1
   WALWriteLock   Exclusive   1
  WALInsertLock 

[PERFORM] Second SpecJAppserver2004 with PostgreSQL

2007-07-23 Thread Jignesh K. Shah

http://blogs.sun.com/jkshah/entry/specjappserver2004_with_glassfish_v2_and

This time with 33% less App Server hardware but same setup for 
PostgreSQL 8.2.4 with 4.5% better score .. There has been reduction in 
CPU utilization by postgresql with the new app server which means there 
is potential to do more JOPS. But looks like Simon Rigg added another 
project for us to look into maxalign side effect with Solaris on SPARC 
before doing more benchmarks now.


Cheers,
Jignesh


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [PERFORM] User concurrency thresholding: where do I look?

2007-07-20 Thread Jignesh K. Shah

Awww Josh,

I was just enjoying the chat on the picket fence! :-)

Anyway the workload is mixed (reads,writes) with simple to medium 
queries. The workload is known to scale well. But inorder to provide 
substantial input I am still trying to eliminate things that can 
bottleneck.  Currently I have eliminated CPU (plenty free) , RAM 
(memory  is 48GB RAM in this server for  a 32-bit postgresql instance), 
IO Storage (used the free ram to do /tmp database to eliminate IO)  and 
am still trying to eliminate any network bottlenecks to say for sure we 
have a problem in PostgreSQL. But yes till that final thing is confirmed 
(network which can very well be the case) it could be a problem 
somewhere else. However the thing that worries me is more of the big 
drop instead of remaining constant out there..


Anyway more on this within a day or two once I add more network nics 
between the systems to eliminate network problems (even though stats 
dont show them as problems right now) and also reduce malloc lock 
penalties if any.


As for other questions:

max_locks_per_transactions is set to default (10 I believe) increasing 
it still seems to degrade overall throughput number.


max_connections is set to 1500 for now till I get decent scaling till 
1400-1500 users.


There are no hard failures reported anywhere. Log min durations does 
show that queries are now slowing down and taking longer.


OS is not swapping and also eliminated IO  by putting the whole database 
on /tmp


So while I finish adding more network connections between the two 
systems (need to get cards) do enjoy the following URL :-)


http://www.spec.org/jAppServer2004/results/res2007q3/jAppServer2004-20070703-00073.html

Of course all postgresql.conf still remains from the old test so no 
flames on that one again :-)


Regards,
Jignesh




Josh Berkus wrote:

Tom,

  

Well, if the load is a lot of short writing transactions then you'd
expect the throughput to depend on how fast stuff can be pushed down to
WAL.  What have you got wal_buffers set to?  Are you using a commit
delay?  What's the I/O system anyway (any BB write cache on the WAL
disk?) and what wal sync method are you using?



You know, I think Jignesh needs to me on this list so I can stop relaying 
questions on a workload I didn't design.  Let me get him.


  


---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [PERFORM] User concurrency thresholding: where do I look?

2007-07-20 Thread Jignesh K. Shah
I forgot to add one more piece of information.. I also tried the same 
test with 64-bit postgresql with 6GB shared_buffers and results are the 
same it drops around the same point which to me sounds like a bottleneck..


More later

-Jignesh


Jignesh K. Shah wrote:

Awww Josh,

I was just enjoying the chat on the picket fence! :-)

Anyway the workload is mixed (reads,writes) with simple to medium 
queries. The workload is known to scale well. But inorder to provide 
substantial input I am still trying to eliminate things that can 
bottleneck.  Currently I have eliminated CPU (plenty free) , RAM 
(memory  is 48GB RAM in this server for  a 32-bit postgresql 
instance), IO Storage (used the free ram to do /tmp database to 
eliminate IO)  and am still trying to eliminate any network 
bottlenecks to say for sure we have a problem in PostgreSQL. But yes 
till that final thing is confirmed (network which can very well be the 
case) it could be a problem somewhere else. However the thing that 
worries me is more of the big drop instead of remaining constant out 
there..


Anyway more on this within a day or two once I add more network nics 
between the systems to eliminate network problems (even though stats 
dont show them as problems right now) and also reduce malloc lock 
penalties if any.


As for other questions:

max_locks_per_transactions is set to default (10 I believe) increasing 
it still seems to degrade overall throughput number.


max_connections is set to 1500 for now till I get decent scaling till 
1400-1500 users.


There are no hard failures reported anywhere. Log min durations does 
show that queries are now slowing down and taking longer.


OS is not swapping and also eliminated IO  by putting the whole 
database on /tmp


So while I finish adding more network connections between the two 
systems (need to get cards) do enjoy the following URL :-)


http://www.spec.org/jAppServer2004/results/res2007q3/jAppServer2004-20070703-00073.html 



Of course all postgresql.conf still remains from the old test so no 
flames on that one again :-)


Regards,
Jignesh




Josh Berkus wrote:

Tom,

 

Well, if the load is a lot of short writing transactions then you'd
expect the throughput to depend on how fast stuff can be pushed down to
WAL.  What have you got wal_buffers set to?  Are you using a commit
delay?  What's the I/O system anyway (any BB write cache on the WAL
disk?) and what wal sync method are you using?



You know, I think Jignesh needs to me on this list so I can stop 
relaying questions on a workload I didn't design.  Let me get him.


  


---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [PERFORM] User concurrency thresholding: where do I look?

2007-07-20 Thread Jignesh K. Shah
Yes I did see increase in context switches and CPU migrations at that 
point using mpstat.


Regards,
Jignesh


Tom Lane wrote:

Jignesh K. Shah [EMAIL PROTECTED] writes:
  
There are no hard failures reported anywhere. Log min durations does 
show that queries are now slowing down and taking longer.
OS is not swapping and also eliminated IO  by putting the whole database 
on /tmp



Hmm.  Do you see any evidence of a context swap storm (ie, a drastic
increase in the context swaps/second reading reported by vmstat)?

regards, tom lane

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match
  


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [PERFORM] User concurrency thresholding: where do I look?

2007-07-20 Thread Jignesh K. Shah


Tom Lane wrote:

Jignesh K. Shah [EMAIL PROTECTED] writes:
Yes I did see increase in context switches and CPU migrations at that 
point using mpstat.


So follow that up --- try to determine which lock is being contended
for.  There's some very crude code in the sources that you can enable
with -DLWLOCK_STATS, but probably DTrace would be a better tool.

regards, tom lane


Using plockstat -A -s 5 -p $pid

on bgwriter:  doesnt report anything

On one of the many connections:

This one is hard to read easily
By default, plockstat monitors all lock con-
tention events, gathers  frequency  and  timing  data  about
those  events, and displays the data in decreasing frequency
order, so that the most common events appear first.


^Cbash-3.00# plockstat -A -s 5  -p 6401
^C
Mutex hold

---
Count nsec Lock Caller
  59   186888 0x10059e280  libumem.so.1`process_free+0x12c

 nsec  Time Distribution --- count Stack
16384 || 1 libumem.so.1`process_free+0x12c
32768 |@   |14 postgres`AllocSetDelete+0x98
65536 |@@  | 5 
postgres`MemoryContextDelete+0x78

   131072 || 0 postgres`CommitTransaction+0x240
   262144 |@@@ |39
---
Count nsec Lock Caller
 53012226 0x10059e280  
libumem.so.1`umem_cache_alloc+0x200


 nsec  Time Distribution --- count Stack
 4096 |@@@ |   338 
libumem.so.1`umem_cache_alloc+0x200

 8192 |@   |24 libumem.so.1`umem_alloc+0x5c
16384 |@   |37 libumem.so.1`malloc+0x40
32768 |@   |   131 postgres`AllocSetAlloc+0x1c4
---
Count nsec Lock Caller
 32410214 0x100578030  libumem.so.1`vmem_xfree+0x164

 nsec  Time Distribution --- count Stack
 4096 |@@  |   192 libumem.so.1`vmem_xfree+0x164
 8192 ||56 libumem.so.1`process_free+0x12c
16384 |@   |26 postgres`AllocSetDelete+0x98
32768 |@@@ |50 
postgres`MemoryContextDelete+0x78

---
Count nsec Lock Caller
 16113585 0x10059e280  libumem.so.1`process_free+0x12c

 nsec  Time Distribution --- count Stack
 4096 |@   |   118 libumem.so.1`process_free+0x12c
 8192 || 4 postgres`AllocSetDelete+0x98
16384 |@   |10 
postgres`MemoryContextDelete+0x78

32768 |@@@ |24 postgres`PortalDrop+0x160
65536 || 3
   131072 || 0
   262144 || 2
---
Count nsec Lock Caller
 326 6081 libumem.so.1`vmem0+0xc38 libumem.so.1`vmem_xalloc+0x630

 nsec  Time Distribution --- count Stack
 4096 ||   170 libumem.so.1`vmem_xalloc+0x630
 8192 |@@@ |   155 libumem.so.1`vmem_alloc+0x1f8
16384 || 1 libumem.so.1`vmem_xalloc+0x524
   libumem.so.1`vmem_alloc+0x1f8
---
Count nsec Lock Caller
 326 5867 libumem.so.1`vmem0+0x30  libumem.so.1`vmem_alloc+0x248

 nsec  Time Distribution --- count Stack
 4096 |@   |   185 libumem.so.1`vmem_alloc+0x248
 8192 |@@  |   141 
libumem.so.1`vmem_sbrk_alloc+0x30

   libumem.so.1`vmem_xalloc+0x524
   libumem.so.1`vmem_alloc+0x1f8
---
Count nsec Lock Caller
 318 5873 0x100578030  libumem.so.1`vmem_alloc+0x1d0

 nsec  Time Distribution --- count Stack
 4096 |@   |   228 libumem.so.1`vmem_alloc+0x1d0
 8192 |@   |78 libumem.so.1`umem_alloc+0xec
16384 || 6 libumem.so.1`malloc+0x40
32768 || 6 postgres`AllocSetAlloc+0x1c4
---
Count nsec Lock Caller
 326 5591

Re: [PERFORM] User concurrency thresholding: where do I look?

2007-07-20 Thread Jignesh K. Shah

sorry..

The are solaris mutex locks used by the postgresql process.

What its saying is that there are holds/waits in trying to get locks 
which are locked at Solaris user library levels called from the  
postgresql functions:
For example both the following functions are hitting on the same mutex 
lock  0x10059e280  in Solaris Library call:

postgres`AllocSetDelete+0x98
postgres`AllocSetAlloc+0x1c4


I need to enable the DTrace probes on my builds

-Jignesh

Tom Lane wrote:

Jignesh K. Shah [EMAIL PROTECTED] writes:
  

Tom Lane wrote:


So follow that up --- try to determine which lock is being contended
for.  There's some very crude code in the sources that you can enable
with -DLWLOCK_STATS, but probably DTrace would be a better tool.
  


  

Using plockstat -A -s 5 -p $pid



I don't know what that is, but it doesn't appear to have anything to do
with Postgres LWLocks or spinlocks, which are the locks I was thinking of.
Try asking Robert Lor about this --- IIRC he had some dtrace probes to
work with our locks.

regards, tom lane
  


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] User concurrency thresholding: where do I look?

2007-07-20 Thread Jignesh K. Shah
 12805   45  18   
0  37
 50   0  212   2847 10590  262 3042  3880 11789   44  17   
0  39

 61   0  154   307   48 9583  241 2903  3240 10564   50  15   0  35
 70   0  840   535  206 10354  247 3035  4280 11700   42  22   
0  37

160   0  169   3035 7446  286 2250  2900  8361   42  13   0  45
170   0  173   2405 7640  225 2288  2950  8674   41  13   0  47
180   0  170   2895 7445  270 2108  2860  8167   41  12   0  47
190   0  176 511185 7365  197 2138  2880  7934   40  13   0  47
201   0  172   2226 7835  204 2323  2980  8759   40  14   0  46
210   0  167   2335 7749  218 2339  3260  8264   42  13   0  46
220   0  749  6612 6516 4173   97 1166  4210  4741   23  44   0  33
230   0  181   2396 7709  219 2258  3830  8402   41  12   0  47
CPU minf mjf xcal  intr ithr  csw icsw migr smtx  srw syscl  usr sys  wt idl
 00   0  198   4396 10364  417 3113  3270 11962   49  17   
0  34
 10   0  210   2996 10655  282 3135  3460 12463   47  17   
0  36

 20   0  202   3528 9960  332 2890  3200 11261   47  16   0  37
 30   0  182   2766 9950  255 2857  3340 11021   46  16   0  38
 40   0  200   3056 10841  286 3127  3250 12440   48  18   
0  35

 50   0  240   2866 9983  272 2912  3580 11450   46  16   0  37
 60   0  153   323   81 9062  233 2767  3000  9675   49  18   0  33
 70   0  850   556  206 10027  271 2910  4150 11048   43  22   
0  35

160   0  152   3065 7261  291 2216  2660  8055   44  12   0  44
170   0  151   2365 7193  217 2170  2830  8099   43  12   0  45
180   0  170   2635 7008  246 2009  2540  7836   43  12   0  46
190   0  165 477385 6824  197 1989  2730  7663   42  12   0  46
200   0  188   2176 7496  197   2800  8435   43  13   0  44
210   0  179   2485 7352  234 2233  3090  8237   43  12   0  44
220   0  813  6041 5963 4006   82 1125  4480  4442   25  42   0  33
230   0  162   2415 7364  225 2170  3550  7720   43  11   0  45




Tom Lane wrote:

Jignesh K. Shah [EMAIL PROTECTED] writes:
  
What its saying is that there are holds/waits in trying to get locks 
which are locked at Solaris user library levels called from the  
postgresql functions:
For example both the following functions are hitting on the same mutex 
lock  0x10059e280  in Solaris Library call:

postgres`AllocSetDelete+0x98
postgres`AllocSetAlloc+0x1c4



That's a perfect example of the sort of useless overhead that I was
complaining of just now in pgsql-patches.  Having malloc/free use
an internal mutex is necessary in multi-threaded programs, but the
backend isn't multi-threaded.  And yet, apparently you can't turn
that off in Solaris.

(Fortunately, the palloc layer is probably insulating us from malloc's
performance enough that this isn't a huge deal.  But it's annoying.)

regards, tom lane

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly
  


---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] PostgreSQL publishes first real benchmark

2007-07-11 Thread Jignesh K. Shah
Its hard to do direct comparison since that one used a different 
commercial application server than the PostgreSQL test.. As we do more 
tests with the help of the community, hopefully we can understand where 
the CPU cycles are spent and see how to make them more efficient...


Stay tuned!!

-Jignesh


Philippe Amelant wrote:

am I wrong or DB2 9.1 is faster on less powerfull hardware ?

Le lundi 09 juillet 2007 à 11:57 -0400, Jignesh K. Shah a écrit :
  

Hello all,

I think this result will be useful for performance discussions of 
postgresql against other databases.


http://www.spec.org/jAppServer2004/results/res2007q3/

More on Josh Berkus's blog:

http://blogs.ittoolbox.com/database/soup/archives/postgresql-publishes-first-real-benchmark-17470

Regards,
Jignesh


---(end of broadcast)---
TIP 6: explain analyze is your friend



---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate
  


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] PostgreSQL publishes first real benchmark

2007-07-11 Thread Jignesh K. Shah



Heikki Linnakangas wrote:



May I ask you why you set max_prepared_transactions to 450, while 
you're apparently not using prepared transactions, according to this 
quote:


Recoverable 2-phase transactions were used to coordinate the 
interaction between

the database server and JMS server using Sun's Last Agent Logging
Optimization; the 1PC database transactions and transaction log 
records are

written to the database in a single transaction.


Did you perhaps use 2PC at first, but didn't bother to change the 
config after switching to the last agent optimization?




Yep.. one of the things that we didn't revert back and got strayed out 
there.


-Jignesh



---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [PERFORM] PostgreSQL publishes first real benchmark

2007-07-11 Thread Jignesh K. Shah

Can you list others that seemed out of place?

Thanks.
Regards,
Jignesh


Tom Lane wrote:

Jignesh K. Shah [EMAIL PROTECTED] writes:
  

Heikki Linnakangas wrote:

May I ask you why you set max_prepared_transactions to 450, while 
you're apparently not using prepared transactions, according to this 
quote:
  


  
Yep.. one of the things that we didn't revert back and got strayed out 
there.



There were quite a few settings in that list that looked like random
experimentation rather than recommendable good practice to me.

regards, tom lane
  


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


[PERFORM] PostgreSQL publishes first real benchmark

2007-07-09 Thread Jignesh K. Shah

Hello all,

I think this result will be useful for performance discussions of 
postgresql against other databases.


http://www.spec.org/jAppServer2004/results/res2007q3/

More on Josh Berkus's blog:

http://blogs.ittoolbox.com/database/soup/archives/postgresql-publishes-first-real-benchmark-17470

Regards,
Jignesh


---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] PostgreSQL publishes first real benchmark

2007-07-09 Thread Jignesh K. Shah


Hi Heikki,

Heikki Linnakangas wrote:


That's really exciting news!

I'm sure you spent a lot of time tweaking the settings, so let me ask 
you something topical:


How did you end up with the bgwriter settings you used? Did you 
experiment with different values? How much difference did it make?




Background writer is still a pain to get it right.. I say it is a 
necessary evil since you are trying to balance it with trying to level 
writes to the disks and  lock contentions caused by the writer itself to 
the postgresql connections. Our typical problem will arise at the high 
number of users where all users are suddenly locked due to the bgwriter 
holding the locks.. Using the hotuser script (which uses pearl/Dtrace 
combination) we ran quite a bit of numbers trying to see which ones 
results in less  overall time spent in PGLock*  calls and yet gave good 
uniform writes to the disks. After reaching the published settings,  
everynow and then we would try playing with different values to see if 
it improves but generally seemed to degrade if changed.. (Of course your 
mileage will vary depending on config, workload, etc).


Still I believe the locking mechanism needs to be revisited at some 
point since that seems to be the one which will eventually limit the 
number of users in such a workload. (Specially if you dont hit the right 
settings for your workload)


Hopefully soon we will get access to bigger capacity servers and redo 
SMP tests on it with the background writer.


Regards,
Jignesh


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] Direct I/O

2007-07-06 Thread Jignesh K. Shah
On Solaris you just look at the mount options on the file system and see
if there is a forcedirectio option enabled. Generally since PostgreSQL
doesn't use any special options for enabling directio that's a known way
to figure it out on Solaris. Atleast on Solaris the performance over
buffered filesystem is better for many workloads but not always. Plus
you typically see a small reduction in CPU usage (system) and ofcourse
memory.

However depending on workload, you may see increased latency in writes
but generally that's not the problem in many workloads since its the
multiple writes to the same file which is better using concurrentio
(modified directio) in Solaris.

As for Linux I will leave that to other experts ..

-Jignesh


lai yoke hman wrote:
 Hello,
 How can I know my PostgreSQL 8 is using direct I/O or buffered I/O? If using 
 buffered I/O, how can I enable direct I/O? What is the performance difference 
 of them?
 This is urgent, Thanks.
 _
 Windows Live Spaces is here! It’s easy to create your own personal Web site. 
 http://spaces.live.com/?mkt=en-my

 ---(end of broadcast)---
 TIP 5: don't forget to increase your free space map settings
   

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM] slow i/o

2006-09-28 Thread Jignesh K. Shah
 wait actv wsvc_t asvc_t  %w  %b device
1.0  268.08.0 2839.9  0.0 97.10.0  360.9   0 100 c1t0d0s6 
(/usr)

 cpu
 us sy wt id
 11 10  0 80
extended device statistics
r/sw/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
0.0  309.00.0 .5 175.2 208.9  566.9  676.2  81  99 c1t0d0s6 
(/usr)

 cpu
 us sy wt id
  0  0  0 100
extended device statistics
r/sw/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
0.0  330.00.0 2704.0 145.6 256.0  441.1  775.7 100 100 c1t0d0s6 
(/usr)

 cpu
 us sy wt id
  4  2  0 94
extended device statistics
r/sw/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
0.0  311.00.0 2543.9 151.0 256.0  485.6  823.2 100 100 c1t0d0s6 
(/usr)

 cpu
 us sy wt id
  2  0  0 98
extended device statistics
r/sw/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
0.0  319.00.0 2576.0 147.4 256.0  462.0  802.5 100 100 c1t0d0s6 
(/usr)

 cpu
 us sy wt id
  0  1  0 98
extended device statistics
r/sw/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
0.00.00.00.0  0.0  0.20.00.0   2  13 c1t0d0s1 (/var)
0.0  366.00.0 3088.0 124.4 255.8  339.9  698.8 100 100 c1t0d0s6 
(/usr)

 cpu
 us sy wt id
  6  5  0 90
extended device statistics
r/sw/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
0.02.00.0   16.0  0.0  1.10.0  533.2   0  54 c1t0d0s1 (/var)
1.0  282.08.0 2849.0  1.5 129.25.2  456.5  10 100 c1t0d0s6 
(/usr)


Thank you in advance for your help!

Jun

On 8/30/06, *Junaili Lie* [EMAIL PROTECTED] mailto:[EMAIL PROTECTED] 
wrote:


I have tried this to no avail.
I have also tried changing the bg_writer_delay parameter to 10. The
spike in i/o still occurs although not in a consistent basis and it
is only happening for a few seconds.
 



 
On 8/30/06, *Jignesh K. Shah* [EMAIL PROTECTED]

mailto:[EMAIL PROTECTED]  wrote:

The bgwriter parameters changed in 8.1

Try

bgwriter_lru_maxpages=0
bgwriter_lru_percent=0

to turn off bgwriter and see if there is any change.

-Jignesh


Junaili Lie wrote:
  Hi Jignesh,
  Thank you for my reply.
  I have the setting just like what you described:

  wal_sync_method = fsync
  wal_buffers = 128
  checkpoint_segments = 128
  bgwriter_all_percent = 0
  bgwriter_maxpages = 0


  I ran the dtrace script and found the following:
  During the i/o busy time, there are postgres processes that
has very
  high BYTES count. During that non i/o busy time, this same process
  doesn't do a lot of i/o activity. I checked the
pg_stat_activity but
  couldn't found this process. Doing ps revealed that this
process is
  started at the same time since the postgres started, which
leads me to
  believe that it maybe background writer or some other internal
process.
  This process are not autovacuum because it doesn't disappear
when I
  tried turning autovacuum off.
  Except for the ones mentioned above, I didn't modify the other
  background setting:
  MONSOON=# show bgwriter_delay ;
  bgwriter_delay
  
  200
  (1 row)

  MONSOON=# show bgwriter_lru_maxpages ;  bgwriter_lru_maxpages
  ---
  5
  (1 row)

  MONSOON=# show bgwriter_lru_percent ;
  bgwriter_lru_percent
  --
  1
  (1 row)

  This i/o spike only happens at minute 1 and minute 6 (ie.
10.51, 10.56 )
  . If I do select * from pg_stat_activity during this time, I
will see a
  lot of write queries waiting to be processed. After a few seconds,
  everything seems to be gone. All writes that are not happening
at the
  time of this i/o jump are being processed very fast, thus do
not show on
  pg_stat_activity.

  Thanks in advance for the reply,
  Best,

  J

  On 8/29/06, *Jignesh K. Shah*  [EMAIL PROTECTED]
mailto:[EMAIL PROTECTED]
  mailto: [EMAIL PROTECTED] mailto:[EMAIL PROTECTED] wrote:

 Also to answer your real question:

 DTrace On Solaris 10:

 # dtrace -s /usr/demo/dtrace/whoio.d

 It will tell you the pids doing the io activity and  on
which devices.
 There are more scripts in that directory like iosnoop.d,
iotime.d
 and others which also will give

Re: [PERFORM] slow i/o

2006-08-30 Thread Jignesh K. Shah

The bgwriter parameters changed in 8.1

Try

bgwriter_lru_maxpages=0
bgwriter_lru_percent=0

to turn off bgwriter and see if there is any change.

-Jignesh


Junaili Lie wrote:

Hi Jignesh,
Thank you for my reply.
I have the setting just like what you described:

wal_sync_method = fsync
wal_buffers = 128
checkpoint_segments = 128
bgwriter_all_percent = 0
bgwriter_maxpages = 0


I ran the dtrace script and found the following:
During the i/o busy time, there are postgres processes that has very 
high BYTES count. During that non i/o busy time, this same process 
doesn't do a lot of i/o activity. I checked the pg_stat_activity but 
couldn't found this process. Doing ps revealed that this process is 
started at the same time since the postgres started, which leads me to 
believe that it maybe background writer or some other internal process. 
This process are not autovacuum because it doesn't disappear when I  
tried turning autovacuum off.
Except for the ones mentioned above, I didn't modify the other 
background setting:

MONSOON=# show bgwriter_delay ;
 bgwriter_delay

 200
(1 row)

MONSOON=# show bgwriter_lru_maxpages ;  bgwriter_lru_maxpages
---
 5
(1 row)

MONSOON=# show bgwriter_lru_percent ;
 bgwriter_lru_percent
--
 1
(1 row)

This i/o spike only happens at minute 1 and minute 6 (ie. 10.51, 10.56) 
. If I do select * from pg_stat_activity during this time, I will see a 
lot of write queries waiting to be processed. After a few seconds, 
everything seems to be gone. All writes that are not happening at the 
time of this i/o jump are being processed very fast, thus do not show on 
pg_stat_activity.


Thanks in advance for the reply,
Best,

J

On 8/29/06, *Jignesh K. Shah* [EMAIL PROTECTED] 
mailto:[EMAIL PROTECTED] wrote:


Also to answer your real question:

DTrace On Solaris 10:

# dtrace -s /usr/demo/dtrace/whoio.d

It will tell you the pids doing the io activity and  on which devices.
There are more scripts in that directory like iosnoop.d, iotime.d
and others which also will give
other details like file accessed, time it took for the io etc.

Hope this helps.

Regards,
Jignesh


Junaili Lie wrote:
  Hi everyone,
  We have a postgresql 8.1 installed on Solaris 10. It is running fine.
  However, for the past couple days, we have seen the i/o reports
  indicating that the i/o is busy most of the time. Before this, we
only
  saw i/o being busy occasionally (very rare). So far, there has
been no
  performance complaints by customers, and the slow query reports
doesn't
  indicate anything out of the ordinary.
  There's no code changes on the applications layer and no database
  configuration changes.
  I am wondering if there's a tool out there on Solaris to tell which
  process is doing most of the i/o activity?
  Thank you in advance.
 
  J
 




---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [PERFORM] slow i/o

2006-08-29 Thread Jignesh K. Shah

Did you increase the checkpoint segments and changed the default WAL lock 
method to fdsync?

http://blogs.sun.com/jkshah/entry/postgresql_on_solaris_better_use

Try fdsync instead of fysnc as mentioned in the entry.

Regards,
Jignesh


Junaili Lie wrote:

Hi everyone,
We have a postgresql 8.1 installed on Solaris 10. It is running fine. 
However, for the past couple days, we have seen the i/o reports 
indicating that the i/o is busy most of the time. Before this, we only 
saw i/o being busy occasionally (very rare). So far, there has been no 
performance complaints by customers, and the slow query reports doesn't 
indicate anything out of the ordinary.
There's no code changes on the applications layer and no database 
configuration changes.
I am wondering if there's a tool out there on Solaris to tell which 
process is doing most of the i/o activity?

Thank you in advance.

J



---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [PERFORM] Pgsql (and mysql) benchmark on T2000/Solaris and some

2006-05-17 Thread Jignesh K. Shah

You usertime is way too high for T2000...

If you have a 6 core machine with 24 threads, it says all 24 threads are 
reported as being busy with iostat output.


Best way to debug this is  use

prstat -amL
(or if you are dumping it in a file prstat -amLc  prstat.txt)

and find the pids with high user cpu time  and then use the usrcall.d on 
few of those pids.


Also how many database connections do you have and what's the type of 
query run by each connection?


-Jignesh



Arjen van der Meijden wrote:

Hi Jignesh,

The settings from that 'special T2000 dvd' differed from the recommended 
settings on the website you provided. But I don't see much difference in 
performance with any of the adjustments, it appears to be more or less 
the same.


Here are a few iostat lines by the way:

sd0   sd1   sd2   nfs1   cpu
kps tps serv  kps tps serv  kps tps serv  kps tps serv   us sy wt id
  7   1   12  958  50   350   070   00   13  1  0 85
  0   00  2353 29630   000   00   92  7  0  1
  0   00  2062 32620   000   00   93  7  0  0
  1   11  1575 35000   000   00   92  7  0  1
  0   00  1628 36200   000   00   92  8  0  1

It appears to be doing a little less kps/tps on sd1 when I restore my 
own postgresql.conf-settings. (default wal/checkpoints, 20k buffers, 2k 
work mem).


Is it possible to trace the stack's for semsys, like the memcpy-traces, 
or are those of no interest here?


Best regards,

Arjen


On 16-5-2006 17:52, Jignesh K. Shah wrote:


Hi Arjen,

Can you send me my colleagues's names in a private email?

One of the drawbacks of the syscall.d script is relative timings and 
hence if system CPU usage is very low, it gives the relative weightage 
about what portion in that low is associated with what call.. So even 
if you have say..1% system time.. it says that most of it was IO 
related or semsys related. So iostat output with -c option to include 
CPU times helps to put it in  the right perspective.



Also do check the tunables mentioned and make sure they are set.

Regards,
Jignesh


Arjen van der Meijden wrote:


Hi Jignesh,

Jignesh K. Shah wrote:


Hi Arjen,

Looking at your outputs...of syscall and usrcall it looks like

* Spending too much time in semsys  which means you have too 
many connections and they are contending to get a lock.. which is 
potentially the WAL log lock


* llseek is high which means you can obviously gain a bit with the 
right file system/files tuning by caching them right.


Have you set the values for Solaris for T2000 tuned for Postgresql?




Not particularly, we got a special T2000 Solaris dvd from your 
colleagues here in the Netherlands and installed that (actually one 
of your colleagues did). Doing so all the better default 
/etc/system-settings are supposed to be set. I haven't really checked 
that they are, since two of your colleagues have been working on it 
for the mysql-version of the benchmark and I assumed they'd have 
verified that.



Check out the tunables from the following URL

http://www.sun.com/servers/coolthreads/tnb/applications_postgresql.jsp

Try specially the /etc/system and postgresql.conf changes  and see 
if it changes/improves your performance.




I will see that those tunables are verified to be set.

I am a bit surprised though about your remarks, since they'd point at 
the I/O being in the way? But we only have about 600k/sec i/o 
according to vmstat. The database easily fits in memory.
In total I logged about 500k queries of which only 70k where altering 
queries, of which almost all where inserts in log-tables which aren't 
actively read in this benchmark.


But I'll give it a try.

Best regards,

Arjen



Arjen van der Meijden wrote:


Hi List,

In the past few weeks we have been developing a read-heavy 
mysql-benchmark to have an alternative take at 
cpu/platform-performance. Not really to have a look at how fast 
mysql can be.


This benchmark runs on mysql 4.1.x, 5.0.x and 5.1.x and is modelled 
after our website's production database and the load generated on 
it is modelled after a simplified version of our visitor behaviour.


Long story short, we think the test is a nice example of the 
relatively lightweight, read-heavy webapplications out there and 
therefore decided to have a go at postgresql as well.
Of course the queries and indexes have been adjusted to (by our 
knowledge) best suit postgresql, while maintaining the same output 
to the application/interface layer. While the initial structure 
only got postgresql at about half the performance of mysql 4.1.x, 
the current version of our postgresql-benchmark has quite similar 
results to mysql 4.1.x, but both are quite a bit slower than 5.0.x 
(I think its about 30-40% faster).


Since the results from those benchmarks are not yet public (they 
will be put together in a story at our website), I won't go into 
too much

Re: [PERFORM] Pgsql (and mysql) benchmark on T2000/Solaris and some

2006-05-17 Thread Jignesh K. Shah
I have seen MemoryContextSwitchTo taking time before.. However I am not 
sure why would it take so much CPU time?

Maybe that function does not work efficiently on Solaris?

Also I donot have much idea about slot_getattr.

Anybody else? (Other option is to use collect -p $pid experiments to 
gather the data to figure out what instruction is causing the high CPU 
usage) Maybe the Sun engineers out there can help out


-Jignesh


Arjen van der Meijden wrote:
We have the 4 core machine. However, these numbers are taken during a 
benchmark, not normal work load. So the output should display the system 
being working fully ;)


So its postgres doing a lot of work and you already had a look at the 
usrcall for that.


The benchmark just tries to do the queries for random page visits. 
This totals up to about some 80 different queries being executed with 
mostly random parameters. The workload is generated using php so there 
are no connection pools, nor prepared statements.


The queries vary, but are all relatively lightweight queries with less 
than 6 or 7 joinable tables. Almost all queries can use indexes. Most 
tables are under a few MB of data, although there are a few larger than 
that. Most records are relatively small, consisting of mostly numbers 
(id's and such).


The results presented here was with 25 concurrent connections.

Best regards,

Arjen


Jignesh K. Shah wrote:


You usertime is way too high for T2000...

If you have a 6 core machine with 24 threads, it says all 24 threads 
are reported as being busy with iostat output.


Best way to debug this is  use

prstat -amL
(or if you are dumping it in a file prstat -amLc  prstat.txt)

and find the pids with high user cpu time  and then use the usrcall.d 
on few of those pids.


Also how many database connections do you have and what's the type of 
query run by each connection?


-Jignesh



Arjen van der Meijden wrote:


Hi Jignesh,

The settings from that 'special T2000 dvd' differed from the 
recommended settings on the website you provided. But I don't see 
much difference in performance with any of the adjustments, it 
appears to be more or less the same.


Here are a few iostat lines by the way:

sd0   sd1   sd2   nfs1   cpu
kps tps serv  kps tps serv  kps tps serv  kps tps serv   us sy wt id
  7   1   12  958  50   350   070   00   13  1  0 85
  0   00  2353 29630   000   00   92  7  0  1
  0   00  2062 32620   000   00   93  7  0  0
  1   11  1575 35000   000   00   92  7  0  1
  0   00  1628 36200   000   00   92  8  0  1

It appears to be doing a little less kps/tps on sd1 when I restore my 
own postgresql.conf-settings. (default wal/checkpoints, 20k buffers, 
2k work mem).


Is it possible to trace the stack's for semsys, like the 
memcpy-traces, or are those of no interest here?


Best regards,

Arjen


On 16-5-2006 17:52, Jignesh K. Shah wrote:


Hi Arjen,

Can you send me my colleagues's names in a private email?

One of the drawbacks of the syscall.d script is relative timings and 
hence if system CPU usage is very low, it gives the relative 
weightage about what portion in that low is associated with what 
call.. So even if you have say..1% system time.. it says that most 
of it was IO related or semsys related. So iostat output with -c 
option to include CPU times helps to put it in  the right perspective.



Also do check the tunables mentioned and make sure they are set.

Regards,
Jignesh


Arjen van der Meijden wrote:


Hi Jignesh,

Jignesh K. Shah wrote:


Hi Arjen,

Looking at your outputs...of syscall and usrcall it looks like

* Spending too much time in semsys  which means you have too 
many connections and they are contending to get a lock.. which is 
potentially the WAL log lock


* llseek is high which means you can obviously gain a bit with the 
right file system/files tuning by caching them right.


Have you set the values for Solaris for T2000 tuned for Postgresql?





Not particularly, we got a special T2000 Solaris dvd from your 
colleagues here in the Netherlands and installed that (actually one 
of your colleagues did). Doing so all the better default 
/etc/system-settings are supposed to be set. I haven't really 
checked that they are, since two of your colleagues have been 
working on it for the mysql-version of the benchmark and I assumed 
they'd have verified that.



Check out the tunables from the following URL

http://www.sun.com/servers/coolthreads/tnb/applications_postgresql.jsp 



Try specially the /etc/system and postgresql.conf changes  and see 
if it changes/improves your performance.





I will see that those tunables are verified to be set.

I am a bit surprised though about your remarks, since they'd point 
at the I/O being in the way? But we only have about 600k/sec i/o 
according to vmstat. The database easily fits in memory.
In total I logged

Re: [PERFORM] Pgsql (and mysql) benchmark on T2000/Solaris and some

2006-05-16 Thread Jignesh K. Shah

Hi Arjen,

Looking at your outputs...of syscall and usrcall it looks like

* Spending too much time in semsys  which means you have too many 
connections and they are contending to get a lock.. which is potentially 
the WAL log lock



* llseek is high which means you can obviously gain a bit with the right 
file system/files tuning by caching them right.



Have you set the values for Solaris for T2000 tuned for Postgresql?

Check out the tunables from the following URL

http://www.sun.com/servers/coolthreads/tnb/applications_postgresql.jsp

Try specially the /etc/system and postgresql.conf changes  and see if it 
changes/improves your performance.



Regards,
Jignesh


Arjen van der Meijden wrote:

Hi List,

In the past few weeks we have been developing a read-heavy 
mysql-benchmark to have an alternative take at cpu/platform-performance. 
Not really to have a look at how fast mysql can be.


This benchmark runs on mysql 4.1.x, 5.0.x and 5.1.x and is modelled 
after our website's production database and the load generated on it is 
modelled after a simplified version of our visitor behaviour.


Long story short, we think the test is a nice example of the relatively 
lightweight, read-heavy webapplications out there and therefore decided 
to have a go at postgresql as well.
Of course the queries and indexes have been adjusted to (by our 
knowledge) best suit postgresql, while maintaining the same output to 
the application/interface layer. While the initial structure only got 
postgresql at about half the performance of mysql 4.1.x, the current 
version of our postgresql-benchmark has quite similar results to mysql 
4.1.x, but both are quite a bit slower than 5.0.x (I think its about 
30-40% faster).


Since the results from those benchmarks are not yet public (they will be 
put together in a story at our website), I won't go into too much 
details about this benchmark.


Currently we're having a look at a Sun T2000 and will be looking at will 
be looking at other machines as well in the future. We are running the 
sun-release of postgresql 8.1.3 on that T2000 now, but are looking at 
compiling the cvs-head version (for its index-root-cache) somewhere this 
week.


My guess is there are a few people on this list who are interested in 
some dtrace results taken during our benchmarks on that T2000.
Although my knowledge of both Solaris and Dtrace are very limited, I 
already took some samples of the system and user calls. I used Jignesh 
Shah's scripts for that: 
http://blogs.sun.com/roller/page/jkshah?entry=profiling_postgresql_using_dtrace_on 



You can find the samples here:
http://achelois.tweakers.net/~acm/pgsql-t2000/syscall.log
http://achelois.tweakers.net/~acm/pgsql-t2000/usrcall.log

And I also did the memcpy-scripts, here:
http://achelois.tweakers.net/~acm/pgsql-t2000/memcpysize.log
http://achelois.tweakers.net/~acm/pgsql-t2000/memcpystack.log
(this last log is 3.5MB)

If anyone is interested in some more dtrace results, let me know (and 
tell me what commands to run ;-) ).


Best regards,

Arjen

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] Pgsql (and mysql) benchmark on T2000/Solaris and some

2006-05-16 Thread Jignesh K. Shah

Hi Arjen,

Can you send me my colleagues's names in a private email?

One of the drawbacks of the syscall.d script is relative timings and 
hence if system CPU usage is very low, it gives the relative weightage 
about what portion in that low is associated with what call.. So even if 
you have say..1% system time.. it says that most of it was IO related or 
semsys related. So iostat output with -c option to include CPU times 
helps to put it in  the right perspective.



Also do check the tunables mentioned and make sure they are set.

Regards,
Jignesh


Arjen van der Meijden wrote:


Hi Jignesh,

Jignesh K. Shah wrote:


Hi Arjen,

Looking at your outputs...of syscall and usrcall it looks like

* Spending too much time in semsys  which means you have too many 
connections and they are contending to get a lock.. which is 
potentially the WAL log lock


* llseek is high which means you can obviously gain a bit with the 
right file system/files tuning by caching them right.


Have you set the values for Solaris for T2000 tuned for Postgresql?



Not particularly, we got a special T2000 Solaris dvd from your 
colleagues here in the Netherlands and installed that (actually one of 
your colleagues did). Doing so all the better default 
/etc/system-settings are supposed to be set. I haven't really checked 
that they are, since two of your colleagues have been working on it 
for the mysql-version of the benchmark and I assumed they'd have 
verified that.



Check out the tunables from the following URL

http://www.sun.com/servers/coolthreads/tnb/applications_postgresql.jsp

Try specially the /etc/system and postgresql.conf changes  and see if 
it changes/improves your performance.



I will see that those tunables are verified to be set.

I am a bit surprised though about your remarks, since they'd point at 
the I/O being in the way? But we only have about 600k/sec i/o 
according to vmstat. The database easily fits in memory.
In total I logged about 500k queries of which only 70k where altering 
queries, of which almost all where inserts in log-tables which aren't 
actively read in this benchmark.


But I'll give it a try.

Best regards,

Arjen



Arjen van der Meijden wrote:


Hi List,

In the past few weeks we have been developing a read-heavy 
mysql-benchmark to have an alternative take at 
cpu/platform-performance. Not really to have a look at how fast 
mysql can be.


This benchmark runs on mysql 4.1.x, 5.0.x and 5.1.x and is modelled 
after our website's production database and the load generated on it 
is modelled after a simplified version of our visitor behaviour.


Long story short, we think the test is a nice example of the 
relatively lightweight, read-heavy webapplications out there and 
therefore decided to have a go at postgresql as well.
Of course the queries and indexes have been adjusted to (by our 
knowledge) best suit postgresql, while maintaining the same output 
to the application/interface layer. While the initial structure only 
got postgresql at about half the performance of mysql 4.1.x, the 
current version of our postgresql-benchmark has quite similar 
results to mysql 4.1.x, but both are quite a bit slower than 5.0.x 
(I think its about 30-40% faster).


Since the results from those benchmarks are not yet public (they 
will be put together in a story at our website), I won't go into too 
much details about this benchmark.


Currently we're having a look at a Sun T2000 and will be looking at 
will be looking at other machines as well in the future. We are 
running the sun-release of postgresql 8.1.3 on that T2000 now, but 
are looking at compiling the cvs-head version (for its 
index-root-cache) somewhere this week.


My guess is there are a few people on this list who are interested 
in some dtrace results taken during our benchmarks on that T2000.
Although my knowledge of both Solaris and Dtrace are very limited, I 
already took some samples of the system and user calls. I used 
Jignesh Shah's scripts for that: 
http://blogs.sun.com/roller/page/jkshah?entry=profiling_postgresql_using_dtrace_on 



You can find the samples here:
http://achelois.tweakers.net/~acm/pgsql-t2000/syscall.log
http://achelois.tweakers.net/~acm/pgsql-t2000/usrcall.log

And I also did the memcpy-scripts, here:
http://achelois.tweakers.net/~acm/pgsql-t2000/memcpysize.log
http://achelois.tweakers.net/~acm/pgsql-t2000/memcpystack.log
(this last log is 3.5MB)

If anyone is interested in some more dtrace results, let me know 
(and tell me what commands to run ;-) ).


Best regards,

Arjen

---(end of 
broadcast)---

TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq



---(end of broadcast)---
TIP 6: explain analyze is your friend



---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [PERFORM] bad performance on Solaris 10

2006-04-14 Thread Jignesh K. Shah

Hi Bruce,


I saw even on this alias also that people assumed that the default 
wal_sync_method was fsync on Solaris.


I would select fsync or fdsync as the default on Solaris. (I prefer 
fsync as it is already highlighted as default in postgresql)


Another thing to improve the defaults on Solaris will be to increase the 
defaults of

wal_buffers
and
checkpoint_segments

(I think in 8.1 checkpoint_segments have been already improved to a 
default of 8 from the previous 3 and that may be already some help in 
performance out there. )


These three changes improve out-of-box performance of PostgreSQL quite a 
bit on Solaris (SPARC as well as x64 platforms).


Then you will suddenly see decrease in the number of people PostgreSQL 
community complaining about Solaris 10, as well as Solaris community 
complaining about PostgreSQL. (The benefits are mutual)


Don't get me wrong. As Luke mentioned it took a while to get the 
potential of PostgreSQL on Solaris and people like me start doing other 
complex  workarounds in Solaris like forcedirectio, etc. (Yeah I did a 
test, if you force  fsync as wal_sync_method  while on  Solaris, then 
you may not even require to do forcedirectio of your $PGDATA/pg_xlogs to 
get back the lost performance)


If we had realized that fsync/odatasync difference was the culprit we 
could have saved couple of months of efforts.


Yes I agree that putting OS specific things in PostgreSQL hurts 
community and sticking to POSIX standards help.


Just my two cents.

Regards,
Jignesh


Bruce Momjian wrote:


Jignesh K. Shah wrote:
 


Bruce,

Hard to answer that... People like me who know and love PostgreSQL and  
Solaris finds this as an opportunity to make their favorite database 
work best on their favorite operating system.


Many times PostgreSQL has many things based on assumption that it will 
run on  Linux and it is left to Solaris to emulate that behavior.That 
said there are ways to improve performance even on UFS on Solaris, it 
just requires more tweaks.


Hopefully this will lead to few Solaris friendly default values  like 
fsync/odatasync :-)
   



Yes, if someone wants to give us a clear answer on which wal_sync method
is best on all versions of Solaris, we can easily make that change.

 



---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] bad performance on Solaris 10

2006-04-12 Thread Jignesh K. Shah


Bruce,

Hard to answer that... People like me who know and love PostgreSQL and  
Solaris finds this as an opportunity to make their favorite database 
work best on their favorite operating system.


Many times PostgreSQL has many things based on assumption that it will 
run on  Linux and it is left to Solaris to emulate that behavior.That 
said there are ways to improve performance even on UFS on Solaris, it 
just requires more tweaks.


Hopefully this will lead to few Solaris friendly default values  like 
fsync/odatasync :-)


Regards,
Jignesh


Bruce Momjian wrote:



It is hard to imagine why people spend so much time modifying Sun
machines run with acceptable performance when non-Sun operating systems
work fine without such hurtles.
 



---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [PERFORM] Sun Fire T2000 and PostgreSQL 8.1.3

2006-04-06 Thread Jignesh K. Shah

Hi Leigh

inline comments


Leigh Dyer wrote:


Luke Lonergan wrote:


Juan,


We've got a Sun Fire V40z and it's quite a nice machine -- 6x 15krpm 
drives, 4GB RAM, and a pair of Opteron 850s. This gives us more than 
enough power now for what we need, but it's nice to know that we can 
shoehorn a lot more RAM, and up it to eight CPU cores if needed.


The newer Sun Opteron systems look nice too, but unless you're using 
external storage, their little 2.5 hard drives may not be ideal.




Thats because Sun Fire V40z had write cache turned on while the 
4200/4100 has the write cache turned off. There is a religious belief 
around the write cache on the disk in Sun :-)  To really compare the 
performance, you have to turn on the write cache (I believe it was 
format -e and the cache option.. but that could have changed.. need to 
verify that again.. Same goes for T2000 SAS disks too.. Write cache is 
turned off on it so be careful before you compare benchmarks on internal 
drives :-)


-Jignesh




Thanks
Leigh



- Luke



---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster




---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly



---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] Sun Fire T2000 and PostgreSQL 8.1.3

2006-04-06 Thread Jignesh K. Shah
For a DSS type workload with PostgreSQL where you end up with single 
long running queries on postgresql with about 100GB, you better use 
something like Sun Fire V40z with those fast Ultra320 internal drives. 
This might be perfect low cost complete database in a box.


Sun Fire T2000 is great for OLTP where you can end up with hundreds of 
users doing quick and small lookups and T2000 can crank simple thread 
executions far better than others. However when it comes to long running 
queries you end up using 1/32 of the power and may not live up to your 
expectations.  For example consider your PostgreSQL talking to Apache 
WebServer all on T2000... You can put them in separate zones if you have 
different administrators for them. :-)


As for PostgreSQL on Solaris, I already have the best parameters to use 
on Solaris based on my tests, the default odatasync hurts performance on 
Solaris, so does checkpoint segments, others are tweaked so that they 
are set for bigger databases and hence may not show much difference on 
performances...


That said I will still be interested to see your app performance with 
postgreSQL on Sun Fire T2000 as there are always ways of perseverence to 
improve performance :-)



Regards,
Jignesh


Juan Casero (FL FLC) wrote:


Because I plan to develop a rather large (for us anyway) data warehouse
with PostgreSQL.  I am looking for the right hardware that can handle
queries on a database that might grow to over a 100 gigabytes.  Right
now our decision support system based on postgresql 8.1.3 stores retail
sales information for about 4 four years back *but* only as weekly
summaries.  I want to build the system so it can handle daily sales
transactions also.  You can imagine how many more records this will
involve so I am looking for hardware that can give me the performance I
need to make this project useable.  In other words parsing and loading
the daily transaction logs for our stores is likely to take huge amounts
of effort.  I need a machine that can complete the task in a reasonable
amount of time.  As people start to query the database to find sales
related reports and information I need to make sure the queries will run
reasonably fast for them.  I have already hand optimized all of my
queries on the current system.  But currently I only have weekly sales
summaries.  Other divisions in our company have done a similar project
using MS SQL Server on SMP hardware far outclassing the database server
I currently use and they report heavy loads on the server with less than
ideal query run times.  I am sure I can do my part to optimize the
queries once I start this project but there is only so much you can do.
At some point you just need more powerful hardware.  This is where I am
at right now.  Apart from that since I will only get this one chance to
buy a new server for data processing I need to make sure that I buy
something that can grow over time as our needs change.  I don't want to
buy a server only to find out later that it cannot meet our needs with
future database projects.  I have to balance a limited budget, room for
future performance growth, and current system requirements.  Trust me it
isn't easy.  



Juan

-Original Message-
From: Josh Berkus [mailto:[EMAIL PROTECTED] 
Sent: Thursday, April 06, 2006 2:57 AM

To: pgsql-performance@postgresql.org
Cc: Juan Casero (FL FLC); Luke Lonergan
Subject: Re: [PERFORM] Sun Fire T2000 and PostgreSQL 8.1.3

Juan,

 

Ok that is beginning to become clear to me.  Now I need to determine 
if this server is worth the investment for us.  Maybe it is not a 
speed daemon but to be honest the licensing costs of an SMP aware 
RDBMS is outside our budget.
   



You still haven't explained why you want multi-threaded queries.  This
is sounding like keeping up with the Joneses.

--
Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings
 



---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [PERFORM] t1000/t2000 sun-servers

2006-03-06 Thread Jignesh K. Shah

Suggestions for benchmarks on Sun Fire T2000...

* Don't try DSS or TPC-H type of test with Postgres on Sun Fire T2000

Since such queries tend to have one connection, it will perform badly 
with Postgre since it will use only one hardware virtual CPU of the 
available 32 virtual CPU on Sun Fire T2000. (Oracle/DB2 have ways of 
breaking the queries into multiple processes and hence use multiple 
virtual CPUs on Sun Fire T2000, PostgreSQL cannot do the same in such cases)


* Use OLTP Type of benchmark

Where you have more than 30 simultaneous users/connections doing work on 
Postgres without bottlenecking on  datafiles of course :-)


* Use multiple databases or instances of Postgresql

Like migrate all your postgresql databases to one T2000. You might see 
that your average response time may not be faster but it can handle 
probably all your databases migrated to one T2000.


In essence, your single thread performance will not speed up on Sun Fire 
T2000  but you can certainly use it to replace all your individual 
postgresql servers in your organization or see higher scalability in 
terms of number of users handled with 1 server with Sun Fire T2000.



For your /etc/system use the parameters as mentioned in
http://www.sun.com/servers/coolthreads/tnb/parameters.jsp

For hints on setting it up for Postgresql refer to other databases setup on
http://www.sun.com/servers/coolthreads/tnb/applications.jsp

If you get specific performance problems  send email to 
pgsql-performance@postgresql.org


Regards,
Jignesh





Neil Saunders wrote:


I may be able to organize a test on a T2000 if someone could give
advice as to an appropriate test to run...

Cheers,

Neil

On 3/6/06, Claus Guttesen [EMAIL PROTECTED] wrote:
 


Hi.

Has anybody tried the new Sun cool-thread servers t1000/t2000 from
Sun? I'd love to see benchmarks with Solaris 10 and pg 8.1.

regards
Claus

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
 subscribe-nomail command to [EMAIL PROTECTED] so that your
 message can get through to the mailing list cleanly

   



---(end of broadcast)---
TIP 6: explain analyze is your friend
 



---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] t1000/t2000 sun-servers

2006-03-06 Thread Jignesh K. Shah


pgbench according to me is more io write intensive benchmark.

T2000 with its internal drive may not perform well with pgbench with a 
high load. If you are using external storage, try it out.


I havent tried it out yet but let me know what you see.


-Jignesh


Guido Neitzer wrote:


On 06.03.2006, at 21:10 Uhr, Jignesh K. Shah wrote:

Like migrate all your postgresql databases to one T2000. You might  
see that your average response time may not be faster but it can  
handle probably all your databases migrated to one T2000.


In essence, your single thread performance will not speed up on Sun  
Fire T2000  but you can certainly use it to replace all your  
individual postgresql servers in your organization or see higher  
scalability in terms of number of users handled with 1 server with  
Sun Fire T2000.



How good is a pgbench test for evaluating things like this? I have  
used it to compare several machines, operating systems and PostgreSQL  
versions - but it was more or less just out of curiosity. The real  
evaluation was made with real life tests - mostly scripts which  
also tested the application server itself.


But as it was it's easy to compare several machines with pgbench, I  
just did the tests and they were interesting and reflected the real  
world not as bad as I had thought from a benchmark.


So, personally I'm interested in a simple pgbench test - perhaps with  
some more (  50) clients simulated ...


cug



---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] copy and postgresql.conf

2006-02-15 Thread Jignesh K. Shah
What's your postgresql.conf parameter for the equivalent ones that I 
suggested?
I believe your wal_buffers and checkpoint_segments could be bigger. If 
that's the case then yep you are fine.


As for the background writer I am seeing mixed results yet so not sure 
about that.


But thanks for the feedback.

-Jignesh


FERREIRA, William (VALTECH) wrote:


i tested the last version version of PostgreSQL
and for the same test :
before : 40mn
and now : 12mn :)
faster than Oracle (exactly what i wanted :p )

thanks to everybody

Will


-Message d'origine-
De : [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] la part de Albert
Cervera Areny
Envoyé : mardi 14 février 2006 17:07
À : pgsql-performance@postgresql.org
Objet : Re: [PERFORM] copy and postgresql.conf



Sorry, COPY improvements came with 8.1

(http://www.postgresql.org/docs/whatsnew)

A Dimarts 14 Febrer 2006 14:26, FERREIRA, William (VALTECH) va escriure:
 


thanks,

i'm using postgresql 8.0.3
there is no primary key and no index on my tables

regards

-Message d'origine-
De : [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] la part de Albert
Cervera Areny
Envoyé : mardi 14 février 2006 12:38
À : pgsql-performance@postgresql.org
Objet : Re: [PERFORM] copy and postgresql.conf



Hi William,
which PostgreSQL version are you using? Newer (8.0+) versions have some

important performance improvements for the COPY command.

Also, you'll notice significant improvements by creating primary  
foreign

keys after the copy command. I think config tweaking can improve key and

index creation but I don't think you can improve the COPY command itself.

There are also many threads in this list commenting on this issue, 
you'll

find it easely in the archives.

A Dimarts 14 Febrer 2006 10:44, FERREIRA, William (VALTECH) va escriure:
   


hi,

i load data from files using copy method.
Files contain between 2 and 7 millions of rows, spread on 5 tables.

For loading all the data, it takes 40mn, and the same processing takes
17mn with Oracle. I think that this time can be improved by changing
postgresql configuration file. But which parameters i need to manipulate
and with which values ?

Here are the specifications of my system :
V250 architecture sun4u
2xCPU UltraSparc IIIi 1.28 GHz.
8 Go RAM.

Regards.

Will


This e-mail is intended only for the above addressee. It may contain
privileged information. If you are not the addressee you must not copy,
distribute, disclose or use any of the information in it. If you have
received it in error please delete it and immediately notify the sender.
Security Notice: all e-mail, sent to or from this address, may be
accessed by someone other than the recipient, for system management and
security reasons. This access is controlled under Regulation of
Investigatory Powers Act 2000, Lawful Business Practises.

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org
 


--

Albert Cervera Areny
Dept. Informàtica Sedifa, S.L.

Av. Can Bordoll, 149
08202 - Sabadell (Barcelona)
Tel. 93 715 51 11
Fax. 93 715 51 12


  AVISO LEGAL  
La   presente  comunicación  y sus anexos tiene como destinatario la
persona a  la  que  va  dirigida, por  lo  que  si  usted lo  recibe
por error  debe  notificarlo  al  remitente  y   eliminarlo   de  su
sistema,  no  pudiendo  utilizarlo,  total  o   parcialmente,   para
ningún  fin.  Su  contenido  puede  tener información confidencial o
protegida legalmente   y   únicamente   expresa  la  opinión del
remitente.  El   uso   del   correo   electrónico   vía Internet  no
permite   asegurarni  la   confidencialidad   de   los  mensajes
nisucorrecta recepción.   Enel  caso   de   que   el
destinatario no consintiera la utilización  del correo  electrónico,
deberá ponerlo en nuestro conocimiento inmediatamente.

... DISCLAIMER .
This message and its  attachments are  intended  exclusively for the
named addressee. If you  receive  this  message  in   error,  please
immediately delete it from  your  system  and notify the sender. You
may  not  use  this message  or  any  part  of it  for any  purpose.
The   message   may  contain  information  that  is  confidential or
protected  by  law,  and  any  opinions  expressed  are those of the
individualsender.  Internet  e-mail   guarantees   neither   the
confidentiality   nor  the  proper  receipt  of  the  message  sent.
If  the  addressee  of  this  message  does  not  consent to the use
of   internete-mail,pleaseinform usinmmediately.







Re: [PERFORM] copy and postgresql.conf

2006-02-15 Thread Jignesh K. Shah

Actually  fsync is not the default on solaris (verify using show all;)

(If you look closely in postgresql.conf it is commented out and 
mentioned as default but show all tells a different story)


In all my cases I saw the default as
wal_sync_method | open_datasync

Also I had seen quite an   improvement by changing the default 
checkpoint_segments from 3 to 64 or 128 and also increasing wal_buffers 
to 64  depending on how heavy is your load.


Also open_datasync type of operations benefit with forcedirectio on 
Solaris and hence either move wal to forcedirectio mounted file system 
or try changing default sync to fsync (the *said* default)


Now if you use fsync then you need a bigger file system cache since by 
default Solaris's segmap mechanism only maps 12% of your physical ram to 
be used for file system buffer cache. Increasing segmap_percent to 50 on 
SPARC allows to use 50% of your RAM to be mapped to be used for 50% ( 
NOTE: It does not reserve but just allow mapping of the memory which can 
be used for file system buffer cache)


Changing maxphys allows the file system buffer cache to coalesce writes 
from the 8Ks that PostgreSQL is doing to bigger writes/reads. Also since 
you are now exploiting file system buffer cache, file system Logging is 
very much recommended (available from a later update of Solaris 8 I 
believe).



Regards,
Jignesh





FERREIRA, William (VALTECH) wrote:


with PostgreSQL 8.1.3, here are my parameters (it's the default configuration)

wal_sync_method = fsync
wal_buffers = 8
checkpoint_segments = 3
bgwriter_lru_percent = 1.0
bgwriter_lru_maxpages = 5
bgwriter_all_percent = 0.333
bgwriter_all_maxpages = 5

and you think times can be improved again ?

-Message d'origine-
De : [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]
Envoyé : mercredi 15 février 2006 15:14
À : FERREIRA, William (VALTECH)
Cc : Albert Cervera Areny; pgsql-performance@postgresql.org
Objet : Re: [PERFORM] copy and postgresql.conf



What's your postgresql.conf parameter for the equivalent ones that I
suggested?
I believe your wal_buffers and checkpoint_segments could be bigger. If
that's the case then yep you are fine.

As for the background writer I am seeing mixed results yet so not sure
about that.

But thanks for the feedback.

-Jignesh


FERREIRA, William (VALTECH) wrote:

 


i tested the last version version of PostgreSQL
and for the same test :
before : 40mn
and now : 12mn :)
faster than Oracle (exactly what i wanted :p )

thanks to everybody

Will


-Message d'origine-
De : [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] la part de Albert
Cervera Areny
Envoyé : mardi 14 février 2006 17:07
À : pgsql-performance@postgresql.org
Objet : Re: [PERFORM] copy and postgresql.conf



Sorry, COPY improvements came with 8.1

(http://www.postgresql.org/docs/whatsnew)

A Dimarts 14 Febrer 2006 14:26, FERREIRA, William (VALTECH) va escriure:


   


thanks,

i'm using postgresql 8.0.3
there is no primary key and no index on my tables

regards

-Message d'origine-
De : [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] la part de Albert
Cervera Areny
Envoyé : mardi 14 février 2006 12:38
À : pgsql-performance@postgresql.org
Objet : Re: [PERFORM] copy and postgresql.conf



Hi William,
which PostgreSQL version are you using? Newer (8.0+) versions have some

important performance improvements for the COPY command.

Also, you'll notice significant improvements by creating primary  
foreign

keys after the copy command. I think config tweaking can improve key and

index creation but I don't think you can improve the COPY command itself.

There are also many threads in this list commenting on this issue, 
you'll

find it easely in the archives.

A Dimarts 14 Febrer 2006 10:44, FERREIRA, William (VALTECH) va escriure:
 

 


hi,

i load data from files using copy method.
Files contain between 2 and 7 millions of rows, spread on 5 tables.

For loading all the data, it takes 40mn, and the same processing takes
17mn with Oracle. I think that this time can be improved by changing
postgresql configuration file. But which parameters i need to manipulate
and with which values ?

Here are the specifications of my system :
V250 architecture sun4u
2xCPU UltraSparc IIIi 1.28 GHz.
8 Go RAM.

Regards.

Will


This e-mail is intended only for the above addressee. It may contain
privileged information. If you are not the addressee you must not copy,
distribute, disclose or use any of the information in it. If you have
received it in error please delete it and immediately notify the sender.
Security Notice: all e-mail, sent to or from this address, may be
accessed by someone other than the recipient, for system management and
security reasons. This access is controlled under Regulation of
Investigatory Powers Act 2000, Lawful Business Practises.

---(end of broadcast)---
TIP 4: Have you searched our list archives?

Re: [PERFORM] copy and postgresql.conf

2006-02-14 Thread Jignesh K. Shah

What version of Solaris are you using?

Do you have the recommendations while using COPY on Solaris?
http://blogs.sun.com/roller/page/jkshah?entry=postgresql_on_solaris_better_use

wal_sync_method = fsync
wal_buffers = 128
checkpoint_segments = 128
bgwriter_percent = 0
bgwriter_maxpages = 0


And also for /etc/system on Solaris 10, 9 SPARC use the following

set maxphys=1048576
set md:md_maxphys=1048576
set segmap_percent=50
set ufs:freebehind=0
set msgsys:msginfo_msgmni = 3584
set semsys:seminfo_semmni = 4096
set shmsys:shminfo_shmmax = 15392386252
set shmsys:shminfo_shmmni = 4096


Can you try putting in one run with this values and send back your 
experiences on whether it helps your workload or not?


Atleast I saw improvements using the above settings with COPY with 
Postgres 8.0 and Postgres 8.1 on Solaris.


Regards,
Jignesh




FERREIRA, William (VALTECH) wrote:


30% faster !!! i will test this new version ...

thanks a lot

-Message d'origine-
De : [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] la part de Albert
Cervera Areny
Envoyé : mardi 14 février 2006 17:07
À : pgsql-performance@postgresql.org
Objet : Re: [PERFORM] copy and postgresql.conf



Sorry, COPY improvements came with 8.1

(http://www.postgresql.org/docs/whatsnew)

A Dimarts 14 Febrer 2006 14:26, FERREIRA, William (VALTECH) va escriure:
 


thanks,

i'm using postgresql 8.0.3
there is no primary key and no index on my tables

regards

-Message d'origine-
De : [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] la part de Albert
Cervera Areny
Envoyé : mardi 14 février 2006 12:38
À : pgsql-performance@postgresql.org
Objet : Re: [PERFORM] copy and postgresql.conf



Hi William,
which PostgreSQL version are you using? Newer (8.0+) versions have some

important performance improvements for the COPY command.

Also, you'll notice significant improvements by creating primary  
foreign

keys after the copy command. I think config tweaking can improve key and

index creation but I don't think you can improve the COPY command itself.

There are also many threads in this list commenting on this issue, 
you'll

find it easely in the archives.

A Dimarts 14 Febrer 2006 10:44, FERREIRA, William (VALTECH) va escriure:
   


hi,

i load data from files using copy method.
Files contain between 2 and 7 millions of rows, spread on 5 tables.

For loading all the data, it takes 40mn, and the same processing takes
17mn with Oracle. I think that this time can be improved by changing
postgresql configuration file. But which parameters i need to manipulate
and with which values ?

Here are the specifications of my system :
V250 architecture sun4u
2xCPU UltraSparc IIIi 1.28 GHz.
8 Go RAM.

Regards.

Will


This e-mail is intended only for the above addressee. It may contain
privileged information. If you are not the addressee you must not copy,
distribute, disclose or use any of the information in it. If you have
received it in error please delete it and immediately notify the sender.
Security Notice: all e-mail, sent to or from this address, may be
accessed by someone other than the recipient, for system management and
security reasons. This access is controlled under Regulation of
Investigatory Powers Act 2000, Lawful Business Practises.

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org
 


--

Albert Cervera Areny
Dept. Informàtica Sedifa, S.L.

Av. Can Bordoll, 149
08202 - Sabadell (Barcelona)
Tel. 93 715 51 11
Fax. 93 715 51 12


  AVISO LEGAL  
La   presente  comunicación  y sus anexos tiene como destinatario la
persona a  la  que  va  dirigida, por  lo  que  si  usted lo  recibe
por error  debe  notificarlo  al  remitente  y   eliminarlo   de  su
sistema,  no  pudiendo  utilizarlo,  total  o   parcialmente,   para
ningún  fin.  Su  contenido  puede  tener información confidencial o
protegida legalmente   y   únicamente   expresa  la  opinión del
remitente.  El   uso   del   correo   electrónico   vía Internet  no
permite   asegurarni  la   confidencialidad   de   los  mensajes
nisucorrecta recepción.   Enel  caso   de   que   el
destinatario no consintiera la utilización  del correo  electrónico,
deberá ponerlo en nuestro conocimiento inmediatamente.

... DISCLAIMER .
This message and its  attachments are  intended  exclusively for the
named addressee. If you  receive  this  message  in   error,  please
immediately delete it from  your  system  and notify the sender. You
may  not  use  this message  or  any  part  of it  for any  purpose.
The   message   may  contain  information  that  is  confidential or
protected  by  law,  and  any  

Re: [PERFORM] PostgreSQL Solaris packages now in beta

2006-01-26 Thread Jignesh K. Shah
For people installing PostgreSQL on Solaris with the new packaget, it  
will show a greatly improved experience to get PostgreSQL up and running 
which was quite a inhibitor in terms of Love at First Sight. This will 
now help people familiar with Solaris have a great first impression of 
PostgreSQL  and hence lower the barrier of entry to PostgreSQL. With 
more than 3.7 million downloads of Solaris 10 already, now PostgreSQL 
have accesss to probably a 3.7 million  incremental user-base of these 
relatively New PostgreSQL users.



Regards,
Jignesh




Robert Lor wrote:



With big thanks to Josh Berkus and Devrim Gunduz, I'm happy to 
announce that Sun has just released a Solaris distribution of 
PostgreSQL 8.1.2  with ready-to-install packages for both Sparc and 
x86.  These packages are currently in Beta, and we expect to FCS in 2 
-3 weeks.  The packages, along with an install guide, are available 
for download at http://pgfoundry.org/projects/solarispackages/


We have tightly integrated PostgreSQL with Solaris in a manner similar 
to the Linux distributions available on postgresql.org. In fact, the 
directory structures are identical.  Starting with Solaris 10 Update 
2, PostgreSQL will be distributed with every copy of Solaris, via 
download and physical media.


We welcome any and all feedback on this PostgreSQL Solaris 
distribution.   Please subscribe to the 
[EMAIL PROTECTED] mailing list to give us 
feedback: http://pgfoundry.org/mail/?group_id=163


BTW, I'm a senior engineer at Sun Microsystems, recently working with 
the PostgreSQL community (namely Josh Berkus, Devrim Gunduz, and Gavin 
Sherry) on the Solaris Packages Project at PgFoundry,  PostgreSQL 
performance optimization on Solaris, and leveraging Solaris 10 
capabilities (e.g. DTrace) specifically for PostgreSQL.  I'll be 
posting a Solaris performance tuning guide in a few weeks.


Regards,
Robert Lor



---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly



---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] Sudden slowdown of Pg server

2006-01-20 Thread Jignesh K. Shah


lockstat is available in Solaris 9. That can help you to determine if 
there are any kernel level locks that are occuring during that time.
Solaris 10 also has plockstat which can be used to identify userland 
locks happening in your process.


Since you have Solaris 9, try the following:

You can run (as root)
lockstat sleep 5 
and note the output which can be long.


I guess prstat -am output, iostat -xczn 3, vmstat 3 outputs will 
help also.


prstat -am has a column called LAT, if the value is in double digits, 
then you have a locking issue which will probably result in higher SLP 
value for the process.  (Interpretation is data and workload specific 
which this email is too small to decode)


Once you have identified a particular process (if any) to be the source 
of the problem, get its id and you can look at the outputs of following 
command which (quite intrusive)

truss -c -p $pid   2 truss-syscount.txt

(Ctrl-C after a while to stop collecting)

truss -a -e -u::: -p $pid 2 trussout.txt

(Ctrl-C after a while to stop collecting)

Regards,
Jignesh


Jerry Sievers wrote:

Hello;  


I am going through a post mortem analysis of an infrequent but
recurring problem on a Pg 8.0.3 installation.  Application code
connects to Pg using J2EE pooled connections.

PostgreSQL 8.0.3 on sparc-sun-solaris2.9, compiled by GCC 
sparc-sun-solaris2.8-gcc (GCC) 3.3.2

Database is quite large with respect to the number of tables, some of
which have up to 6 million tuples.  Typical idle/busy connection ratio
is 3/100 but occationally we'll catch 20 or more busy sessions.

The problem manifests itself and appears like a locking issue.  About
weekly throuput slows down and we notice the busy connection count
rising minute by minute.  2, 20, 40...  Before long, the app server
detects lack of responsiveness and fails over to another app server
(not Pg) which in turn attempts a bunch of new connections into
Postgres.

Sampling of the snapshots of pg_locks and pg_stat_activity tables
takes place each minute.

I am wishing for a few new ideas as to what to be watching; Here's
some observations that I've made.

1. At no time do any UN-granted locks show in pg_locks
2. The number of exclusive locks is small 1, 4, 8
3. Other locks type/mode are numerous but appear like normal workload.
4. There are at   least a few old 'IDLE In Transaction' cases in
  activity view
5. No interesting error messages or warning in Pg logs.
6. No crash of Pg backend

Other goodies includes a bounty of poor performing queries which are
constantly being optimized now for good measure.  Aside from the heavy
queries, performance is generallly decent.

Resource related server configs have been boosted substantially but
have not undergone any formal RD to verify that we're inthe safe
under heavy load.

An max_fsm_relations setting which is *below* our table and index
count was discovered by me today and will be increased this evening
during a maint cycle.

The slowdown and subsequent run-away app server takes place within a
small 2-5 minute window and I have as of yet not been able to get into
Psql during the event for a hands-on look.

Questions;

1. Is there any type of resource lock that can unconditionally block
  another session and NOT appear as UN-granted lock?

2. What in particular other runtime info would be most useful to
  sample here?

3. What Solaris side runtime stats might give some clues here
  (maybe?)( and how often to sample?  Assume needs to be aggressive
  due to how fast this problem crops up.

Any help appreciated

Thank you


 



---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] What's the best hardver for PostgreSQL 8.1?

2005-12-21 Thread Jignesh K. Shah

Hi Juan,

Solaris 10 license is for free.. Infact I believe you do receive the 
media with Sun Fire V20z. If you want support then there are various 
pay plans depending on the level of support. If not your license 
allows you Right to Use anyway for free.


That said I haven't done much testing with 32/64 bit differences. 
However for long term purposes, 64-bit always seems to be the safe bet. 
As for your load performance, lot of it depends on your file system 
layout also.


Regards,
Jignesh



Juan Casero wrote:

I just sent my boss an email asking him for a Sun v20z with dual 2.2 Ghz 
opterons, 2 Gigs of RAM and RAID 1.  I would have liked a better server 
capable of RAID but that seems to be out of his budget right now.  Ok so I 
assume I get this Sun box.  Most likely I will go with Linux since it is a 
fair bet he doesn't want to pay for the Solaris 10 x86 license.  Although I 
kind of like the idea of using Solaris 10 x86 for this.   I will assume I 
need to install the x64 kernel that comes with say Fedora Core 4.  Should I 
run the Postgresql 8.x binaries in 32 bit mode or 64 bit mode?   My instinct 
tells me 64 bit mode is most efficient for our database size about 20 gigs 
right now but may grow to 100 gigs in a year or so.  I just finished loading 
a 20 gig database on a dual 900 Mhz Ultrasparc III system with 2 gigs of ram 
and about 768 megs of shared memory available for the posgresql server 
running Solaris 10.  The load has smoked a P4 3.2 Ghz system I am using also 
with 2 gigs of ram running postgresql 8.0.3.   I mean I started the sparc 
load after the P4 load.  The sparc load has finished already rebuilding the 
database from a pg_dump file but the P4 system is still going.  The p4 has 
1.3 Gigs of shared memory allocated to postgresql.  How about them apples?



Thanks,
Juan

On Wednesday 21 December 2005 18:57, William Yu wrote:
 


Juan Casero wrote:
   


Can you elaborate on the reasons the opteron is better than the Xeon when
it comes to disk io?   I have a PostgreSQL 7.4.8 box running a DSS.   One
of our
 


Opterons have 64-bit IOMMU -- Xeons don't. That means in 64-bit mode,
transfers to  4GB, the OS must allocated the memory  4GB, DMA to that
block and then the CPU must do extra work in copying the memory to 
4GB. Versus on the Opteron, it's done by the IO adaptor using DMA in the
background.

---(end of broadcast)---
TIP 6: explain analyze is your friend
   



---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly
 



---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] PostgreSQL and Ultrasparc T1

2005-12-20 Thread Jignesh K. Shah
But yes All LWPs (processes and threads) are switched across  virtual 
CPUS . There is intelligence built in Solaris to understand which 
strands are  executing on which cores and it will balance out the cores 
too so if there are only 8 threads running they will essentially run on 
separate cores rather than 2 cores with 8 threads.


The biggest limitation is application scaling. pgbench shows that with 
more processes trying to bottleneck on same files will probably not 
perform better unless you tune your storage/file system. Those are the 
issues which we typically try to solve with community partners (vendors, 
open source) since that gives the biggest benefits.


Best example to verify in such multi-processes environment, do you see 
greater than 60% avg CPU utilization in your dual/quad config 
Xeons/Itaniums, then Sun Fire T2000 will help you a lot. However if you 
are stuck below 50% (for dual) or 25% (for quad) which means you are 
pretty much stuck at 1 CPU performance and/or  probably have more IO 
related contention then it won't help you with these systems.


I hope you get the idea on when a workload will perform better on Sun 
Fire T2000 without burning hands.


I will try to test some more with PostgreSQL on these systems to kind of 
highlight what can work or what will not work.


Is pgbench the workload that you prefer? (It already has issues with 
pg_xlog so my guess is it probably won't scale much)

If you have other workload informations let me know.

Thanks.
Regards,
Jignesh



[EMAIL PROTECTED] wrote:


Jignesh,

Juan says the following below:

I figured the number of cores on the T1000/2000 processors would be
utilized by the forked copies of the postgresql server.  From the comments
I have seen so far it does not look like this is the case.

I think this needs to be refuted.  Doesn't Solaris switch processes as well
as threads (LWPs, whatever) equally well amongst cores?  I realize the
process context switch is more expensive than the thread switch, but
Solaris will utilize all cores as processes or threads become ready to run,
correct?

BTW, it's great to see folks with your email address on the list.  I feel
it points to a brighter future for all involved.

Thanks,

Rick


  
Jignesh K. Shah 
[EMAIL PROTECTED] 
  To 
Sent by:  Juan Casero [EMAIL PROTECTED]   
pgsql-performance  cc 
[EMAIL PROTECTED] pgsql-performance@postgresql.org
.org  Subject 
  Re: [PERFORM] PostgreSQL and
  Ultrasparc T1   
12/19/2005 11:19  
PM
  
  
  
  





I guess it depends on what you term as your metric for measurement.
If it is just one query execution time .. It may not be the best on
UltraSPARC T1.
But if you have more than 8 complex queries running simultaneously,
UltraSPARC T1 can do well compared comparatively provided the
application can scale also along with it.

The best way to approach is to figure out your peak workload, find an
accurate way to measure the true metric and then design a  benchmark
for it and run it on both servers.

Regards,
Jignesh


Juan Casero wrote:

 


Ok.  That  is what I wanted to know.  Right now this database is a
   


PostgreSQL
 


7.4.8 system.  I am using it in a sort of DSS role.  I have weekly
   


summaries
 


of the sales for our division going back three years.  I have a PHP based
webapp that I wrote to give the managers access to this data.  The webapp
lets them make selections for reports and then it submits a parameterized
query to the database for execution.  The returned data rows are displayed
   



 


and formatted in their web browser.  My largest sales table is about 13
million rows along with all the indexes it takes up about 20 gigabytes.  I
   



 


need to scale this application up to nearly 100 gigabytes to handle daily
sales summaries.  Once we start looking at daily sales figures our
   


database
 


size could grow ten to twenty times.  I use postgresql because it gives me
   



 


the kind of enterprise database features I need to program the complex
   


logic
 


for the queries.I also need

Re: [PERFORM] PostgreSQL and Ultrasparc T1

2005-12-19 Thread Jignesh K. Shah
Sun Fire T2000 has 3 PCI-E and 1PCI-X  slot free when shipped. Using 
dual fiber channel 2G adapters you can get about 200MB x 8 = 1600MB/sec 
IO bandwidth. Plus when 4G HBAs are supported that will double up. Now I 
think generally that's good enough for 1TB raw data or 2-3 TB Database 
size. Of course typically the database size in PostgreSQL space will be 
in the 100-500GB range so a Sun Fire T2000 can be a good fit with enough 
area to grow at a very reasonable price.


Of course like someone mentioned if all you have is 1 connection using 
postgresql which cannot spawn helper processes/threads, this will be 
limited by the single thread performance which is about 1.2Ghz compared 
on Sun Fire T2000 to AMD64 (Sun Fire X4200) which pretty much has 
similar IO Bandwidth, same size chassis,  but the individual AMD64 cores 
runs at about 2.4Ghz (I believe) and max you can get is 4 cores  but you 
also have to do a little trade off in terms of power consumption in lei 
of faster single thread performance. So Choices are available with both 
architecture. .However if you have a webserver driving a postgreSQL 
backend, then UltraSPARC T1 might be a better option if you suddenly 
wants to do 100s of db connections. The SunFire T2000 gives you 8 cores 
with 32 threads in all running on the system. 

With PostgreSQL 8.1 fix for SMP Bufferpool performance and with ZFS now 
available in Solaris Express release, it would be interesting to see how 
the combination of PostgreSQL 8.1 and ZFS works on Solaris since ZFS is 
one of the perfect file systems for PostgreSQL where it wants all 
complexities (like block allocation, fragmentation, etc) to the 
underlying file systems and not re-implement its own infrastructure.


If somebody is already conducting their own tests, do let me know. As 
soon as I get some free cycles, I want to run ZFS with PostgreSQL using 
Solaris Express. If you have some preferred workloads do let me know.


Regards,
Jignesh


Luke Lonergan wrote:


Juan,

On 12/18/05 8:35 AM, Juan Casero [EMAIL PROTECTED] wrote:

 


Can anyone tell me how well PostgreSQL 8.x performs on the new Sun Ultrasparc
T1 processor and architecture on Solaris 10?   I have a custom built retail
sales reporting that I developed using PostgreSQL 7.48 and PHP on a Fedora
Core 3 intel box.  I want to scale this application upwards to handle a
database that might grow to a 100 GB.  Our company is green mission conscious
now so I was hoping I could use that to convince management to consider a Sun
Ultrasparc T1 or T2 system provided that if I can get the best performance
out of it on PostgreSQL.  So will newer versions of PostgreSQL (8.1.x) be
able to take of advantage of the multiple cores on a T1 or T2?I cannot
change the database and this will be a hard sell unless I can convince them
that the performance advantages are too good to pass up.   The company is
moving in the Win32 direction and so I have to provide rock solid reasons for
why I want to use Solaris Sparc on a T1 or T2 server for this database
application instead of Windows on SQL Server.
   



The Niagara CPUs are heavily multi-threaded and will require a lot of
parallelism to be exposed to them in order to be effective.

Until Sun makes niagara-based machines with lots of I/O channels, there
won't be much I/O parallelism available to match the CPU parallelism.

Bizgres MPP will use the process and I/O parallelism of these big SMP
machines and the version based on Postgres 8.1 will be out in February.

- Luke  




---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings
 



---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [PERFORM] PostgreSQL and Ultrasparc T1

2005-12-19 Thread Jignesh K. Shah

I guess it depends on what you term as your metric for measurement.
If it is just one query execution time .. It may not be the best on 
UltraSPARC T1.
But if you have more than 8 complex queries running simultaneously, 
UltraSPARC T1 can do well compared comparatively provided the 
application can scale also along with it.


The best way to approach is to figure out your peak workload, find an 
accurate way to measure the true metric and then design a  benchmark 
for it and run it on both servers.


Regards,
Jignesh


Juan Casero wrote:

Ok.  That  is what I wanted to know.  Right now this database is a PostgreSQL 
7.4.8 system.  I am using it in a sort of DSS role.  I have weekly summaries 
of the sales for our division going back three years.  I have a PHP based 
webapp that I wrote to give the managers access to this data.  The webapp 
lets them make selections for reports and then it submits a parameterized 
query to the database for execution.  The returned data rows are displayed 
and formatted in their web browser.  My largest sales table is about 13 
million rows along with all the indexes it takes up about 20 gigabytes.  I 
need to scale this application up to nearly 100 gigabytes to handle daily 
sales summaries.  Once we start looking at daily sales figures our database 
size could grow ten to twenty times.  I use postgresql because it gives me 
the kind of enterprise database features I need to program the complex logic 
for the queries.I also need the transaction isolation facilities it 
provides so I can optimize the queries in plpgsql without worrying about 
multiple users temp tables colliding with each other.  Additionally, I hope 
to rewrite the front end application in JSP so maybe I could use the 
multithreaded features of the Java to exploit a multicore multi-cpu system.  
There are almost no writes to the database tables.   The bulk of the 
application is just executing parameterized queries and returning huge 
amounts of data.  I know bizgres is supposed to be better at this but I want 
to stay away from anything that is beta.  I cannot afford for this thing to 
go wrong.  My reasoning for looking at the T1000/2000 was simply the large 
number of cores.  I  know postgresql uses a super server that forks copies of 
itself to handle incoming requests on port 5432.  But I figured the number of 
cores on the T1000/2000 processors would be utilized by the forked copies of 
the postgresql server.  From the comments I have seen so far it does not look 
like this is the case.  We had originally sized up a dual processor dual core 
AMD opteron system from HP for this but I thought I could get more bang for 
the buck on a T1000/2000.  It now seems I may have been wrong.  I am stronger 
in Linux than Solaris so I am not upset I am just trying to find the best 
hardware for the anticipated needs of this application.


Thanks,
Juan

On Monday 19 December 2005 01:25, Scott Marlowe wrote:
 


From: [EMAIL PROTECTED] on behalf of Juan Casero

QUOTE:

Hi -


Can anyone tell me how well PostgreSQL 8.x performs on the new Sun
Ultrasparc T1 processor and architecture on Solaris 10?   I have a custom
built retail sales reporting that I developed using PostgreSQL 7.48 and PHP
on a Fedora Core 3 intel box.  I want to scale this application upwards to
handle a database that might grow to a 100 GB.  Our company is green
mission conscious now so I was hoping I could use that to convince
management to consider a Sun Ultrasparc T1 or T2 system provided that if I
can get the best performance out of it on PostgreSQL.

ENDQUOTE:

Well, generally, AMD 64 bit is going to be a better value for your dollar,
and run faster than most Sparc based machines.

Also, PostgreSQL is generally faster under either BSD or Linux than under
Solaris on the same box.  This might or might not hold as you crank up the
numbers of CPUs.

PostgreSQL runs one process for connection.  So, to use extra CPUs, you
really need to have 1 connection running against the database.

Mostly, databases tend to be either I/O bound, until you give them a lot of
I/O, then they'll be CPU bound.

After that lots of memory, THEN more CPUs.  Two CPUs is always useful, as
one can be servicing the OS and another the database.  But unless you're
gonna have lots of users hooked up, more than 2 to 4 CPUs is usually a
waste.

So, I'd recommend a dual core or dual dual core (i.e. 4 cores) AMD64 system
with 2 or more gigs ram, and at least a pair of fast drives in a mirror
with a hardare RAID controller with battery backed cache.  If you'll be
trundling through all 100 gigs of your data set regularly, then get all the
memory you can put in a machine at a reasonable cost before buying lots of
CPUs.

But without knowing what you're gonna be doing we can't really make solid
recommendations...
   



---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org
 


Re: [PERFORM] High context switches occurring

2005-12-19 Thread Jignesh K. Shah
It basically says pg_xlog is the bottleneck and move it to the disk with 
the best response time that you can afford. :-)
Increasing checkpoint_segments doesn't seem to help much. Playing with 
wal_sync_method might change the behavior.


For proof .. On Solaris, the /tmp is like a RAM Drive...Of course DO NOT 
TRY ON PRODUCTION.


-bash-3.00$ time pgbench -s 10 -c 10 -t 3000 pgbench
starting vacuum...end.
transaction type: TPC-B (sort of)
scaling factor: 10
number of clients: 10
number of transactions per client: 3000
number of transactions actually processed: 3/3
tps = 356.578050 (including connections establishing)
tps = 356.733043 (excluding connections establishing)

real1m24.396s
user0m2.550s
sys 0m3.404s
-bash-3.00$ mv pg_xlog /tmp
-bash-3.00$ ln -s /tmp/pg_xlog pg_xlog
-bash-3.00$ time pgbench -s 10 -c 10 -t 3000 pgbench
starting vacuum...end.
transaction type: TPC-B (sort of)
scaling factor: 10
number of clients: 10
number of transactions per client: 3000
number of transactions actually processed: 3/3
tps = 2413.661323 (including connections establishing)
tps = 2420.754581 (excluding connections establishing)

real0m12.617s
user0m2.229s
sys 0m2.950s
-bash-3.00$ rm pg_xlog
-bash-3.00$ mv /tmp/pg_xlog pg_xlog
-bash-3.00$ time pgbench -s 10 -c 10 -t 3000 pgbench
starting vacuum...end.
transaction type: TPC-B (sort of)
scaling factor: 10
number of clients: 10
number of transactions per client: 3000
number of transactions actually processed: 3/3
tps = 350.227682 (including connections establishing)
tps = 350.382825 (excluding connections establishing)

real1m27.595s
user0m2.537s
sys 0m3.386s
-bash-3.00$


Regards,
Jignesh


Oleg Bartunov wrote:


Hi there,

I see a very low performance and high context switches on our
dual itanium2 slackware box (Linux ptah 2.6.14 #1 SMP)
with 8Gb of RAM, running 8.1_STABLE. Any tips here ?

[EMAIL PROTECTED]:~/cvs/8.1/pgsql/contrib/pgbench$ time pgbench -s 10 -c 
10 -t 3000 pgbench

starting vacuum...end.
transaction type: TPC-B (sort of)
scaling factor: 1
number of clients: 10
number of transactions per client: 3000
number of transactions actually processed: 3/3
tps = 163.817425 (including connections establishing)
tps = 163.830558 (excluding connections establishing)

real3m3.374s
user0m1.888s
sys 0m2.472s

output from vmstat 2

 2  1  0 4185104 197904 321388800 0  1456  673  6852 
25  1 45 29
 6  0  0 4184880 197904 321388800 0  1456  673  6317 
28  2 49 21
 0  1  0 4184656 197904 321388800 0  1464  671  7049 
25  2 42 31
 3  0  0 4184432 197904 321388800 0  1436  671  7073 
25  1 44 29
 0  1  0 4184432 197904 321388800 0  1460  671  7014 
28  1 42 29
 0  1  0 4184096 197920 321387200 0  1440  670  7065 
25  2 42 31
 0  1  0 4183872 197920 321387200 0  1444  671  6718 
26  2 44 28
 0  1  0 4183648 197920 321387200 0  1468  670  6525 
15  3 50 33
 0  1  0 4184352 197920 321387200 0  1584  676  6476 
12  2 50 36
 0  1  0 4193232 197920 321387200 0  1424  671  5848 
12  1 50 37
 0  0  0 4195536 197920 321387200 020  509   104  
0  0 99  1
 0  0  0 4195536 197920 321387200 0  1680  57325  
0  0 99  1
 0  0  0 4195536 197920 321387200 0 0  50422  
0  0 100


processor  : 1
vendor : GenuineIntel
arch   : IA-64
family : Itanium 2
model  : 2
revision   : 2
archrev: 0
features   : branchlong
cpu number : 0
cpu regs   : 4
cpu MHz: 1600.010490
itc MHz: 1600.010490
BogoMIPS   : 2392.06
siblings   : 1



On Mon, 19 Dec 2005, Anjan Dave wrote:



Re-ran it 3 times on each host -

Sun:
-bash-3.00$ time pgbench -s 10 -c 10 -t 3000 pgbench
starting vacuum...end.
transaction type: TPC-B (sort of)
scaling factor: 1
number of clients: 10
number of transactions per client: 3000
number of transactions actually processed: 3/3
tps = 827.810778 (including connections establishing)
tps = 828.410801 (excluding connections establishing)
real0m36.579s
user0m1.222s
sys 0m3.422s

Intel:
-bash-3.00$ time pgbench -s 10 -c 10 -t 3000 pgbench
starting vacuum...end.
transaction type: TPC-B (sort of)
scaling factor: 1
number of clients: 10
number of transactions per client: 3000
number of transactions actually processed: 3/3
tps = 597.067503 (including connections establishing)
tps = 597.606169 (excluding connections establishing)
real0m50.380s
user0m2.621s
sys 0m7.818s

Thanks,
Anjan


-Original Message-
From: Anjan Dave
Sent: Wed 12/7/2005 10:54 AM
To: Tom Lane
Cc: Vivek Khera; Postgresql Performance
Subject: Re: [PERFORM] High context switches occurring



Thanks for your inputs, Tom. I was going after high concurrent 
clients,

but should have read this carefully -

-s scaling_factor
 

Re: [HACKERS] [PERFORM] A Better External Sort?

2005-09-30 Thread Jignesh K. Shah

I have seen similar performance as Josh and my reasoning is as follows:

* WAL is the biggest bottleneck with its default size of 16MB. Many 
people hate to recompile the code   to change its default, and 
increasing checkpoint segments help but still there is lot of overhead 
in the rotation of WAL files (Even putting WAL on tmpfs shows that it is 
still slow). Having an option for bigger size is helpful to a small 
extent percentagewise (and frees up CPU a bit in doing file rotation)


* Growing files: Even though this is OS dependent but it does spend lot 
of time doing small 8K block increases to grow files. If we can signal 
bigger chunks to grow or pre-grow  to expected size of  data files 
that will help a lot in such cases.


* COPY command had restriction but that has been fixed to a large 
extent.(Great job)


But ofcourse I have lost touch with programming and can't begin to 
understand PostgreSQL code to change it myself.


Regards,
Jignesh




Ron Peacetree wrote:


That 11MBps was your =bulk load= speed.  If just loading a table
is this slow, then there are issues with basic physical IO, not just
IO during sort operations.

As I said, the obvious candidates are inefficient physical layout
and/or flawed IO code.

Until the basic IO issues are addressed, we could replace the
present sorting code with infinitely fast sorting code and we'd
still be scrod performance wise.

So why does basic IO suck so badly?

Ron  



-Original Message-
From: Josh Berkus josh@agliodbs.com
Sent: Sep 30, 2005 1:23 PM
To: Ron Peacetree [EMAIL PROTECTED]
Cc: pgsql-hackers@postgresql.org, pgsql-performance@postgresql.org
Subject: Re: [HACKERS] [PERFORM] A Better External Sort?

Ron,

 


Hmmm.
60GB/5400secs= 11MBps.  That's ssllooww.  So the first
problem is evidently our physical layout and/or HD IO layer
sucks.
   



Actually, it's much worse than that, because the sort is only dealing 
with one column.  As I said, monitoring the iostat our top speed was 
2.2mb/s.


--Josh


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly
 



---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [PERFORM] RAID Stripe size

2005-09-20 Thread Jignesh K. Shah

Typically your stripe size impacts read and write.

In Solaris, the trick is to match it with your maxcontig parameter. If 
you set maxcontig to 128 pages which is 128* 8 = 1024k (1M) then your 
optimal stripe size is 128 * 8 / (number of spindles in LUN).. Assuming 
number of spindles is 6 then you get an odd number. In such cases either 
your current io or the next sequential io is going to be little bit 
inefficient depending on what you select (as a rule of thumb however 
just take the closest stripe size). However if your number of spindles 
matches 8 then you get a perfect 128 and hence makes sense to select 
128K. (Maxcontig is a paramter in Solaris which defines the max 
contiguous space allocated to a block which really helps in case of 
sequential  io operations).


But as you see this was maxcontig dependent in my case. What if your 
maxcontig is way off track. This can happen if your io pattern is more 
and more random. In such cases maxcontig is better at lower numbers to 
reduce space wastage and in effect reducing your stripe size reduces 
your responde time.


This means now it is Workload dependent... Random IOs or Sequential IOs 
(atleast where IOs can be clubbed together).


As you can see stripe size in Solaris is eventually dependent on your 
Workload. Typically my guess is on any other platform, the stripe size 
is dependent on your Workload and how it will access the data. Lower 
stripe size helps smaller IOs perform better but lack total throughtput 
efficiency. While larger stripe size increases throughput efficiency at 
the cost of response time of your small IO requirements.


Don't forget many file systems will buffer your IOs and can club them 
together if it finds them sequential from its point of view. Hence in 
such cases the effective IO size is what matters for raid sizes.


If you effective IO sizes are big then go for higher raid size.
If your effective IO sizes are small and response time is critical go 
for smaller raid sizes


Regards,
Jignesh

evgeny gridasov wrote:


Hi Everybody!

I've got a spare machine which is 2xXEON 3.2GHz, 4Gb RAM
14x140Gb SCSI 10k (LSI MegaRaid 320U). It is going into production in 3-5months.
I do have free time to run tests on this machine, and I could test different 
stripe sizes
if somebody prepares a test script and data for that.

I could also test different RAID modes 0,1,5 and 10 for this script.

I guess the community needs these results.

On 16 Sep 2005 04:51:43 -0700
bm\\mbn [EMAIL PROTECTED] wrote:

 


Hi Everyone

The machine is IBM x345 with ServeRAID 6i 128mb cache and 6 SCSI 15k
disks.

2 disks are in RAID1 and hold the OS, SWAP  pg_xlog
4 disks are in RAID10 and hold the Cluster itself.

the DB will have two major tables 1 with 10 million rows and one with
100 million rows.
All the activities against this tables will be SELECT.

Currently the strip size is 8k. I read in many place this is a poor
setting.

Am i right ?
   



 



--
__

Jignesh K. Shah
MTS Software Engineer, 
MDE - Horizontal Technologies 
Sun Microsystems, Inc

Phone: (781) 442 3052
Email: [EMAIL PROTECTED]
__



---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


  1   2   >