Re: [HACKERS] Patch LWlocks instrumentation

2009-09-14 Thread Pierre Frédéric Caillau d



Have you looked at the total execution time with and without the
LWLOCK_TIMING_STATS?


	It didn't show any significant overhead on the little COPY test I made.  
On selects, it probably does (just like EXPLAIN ANALYZE), but I didn't  
test.
	It is not meant to be always active, it's a #define, so I guess it would  
be OK though.


	I'm going to modify it according to your suggestions and repost it (why  
didn't I do that first ?...)



Not that this changes your conclusion.  With or without that distortion I
completely believe that WALInsertLock is the bottleneck of parallel bulk
copy into unindexed tables.  I just can't find anything else it is a  
primary

bottleneck on.  I think the only real solution for bulk copy is to call
XLogInsert less often.  For example, it could build blocks in local  
memory,

then when done copy it into the shared buffers and then toss the entire
block into WAL in one call.  Easier said than implemented, of course.


Actually,

http://archives.postgresql.org/pgsql-hackers/2009-09/msg00806.php

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch LWlocks instrumentation

2009-09-14 Thread Jeff Janes
2009/9/14 Pierre Frédéric Caillaud 

>
> A little bit of a reply to Jeff's email about WALInsertLock.
>
> This patch instruments LWLocks, it is controlled with the following
> #define's in lwlock.c :
>
> LWLOCK_STATS
> LWLOCK_TIMING_STATS
>
> It is an upgrade of current lwlocks stats.
>

Hi Pierre,

Have you looked at the total execution time with and without the
LWLOCK_TIMING_STATS?

I've implemented something similar to this myself (only without attempting
to make it portable and otherwise worthy of submitting as a general-interest
patch), what I found is that attempting to time every "hold" time
substantially increased the overall run time (which I would worry distorts
the reported times, queue bad Heisenberg analogies).  The problem is that
gettimeofday is slow, and on some multi-processor systems it is a global
point of serialization, making it even slower.  I decided to time only the
time spent waiting on a block, and not the time spent holding the lock.
This way you only call gettimeofday twice if you actually need to block, and
not at all if you immediately get the lock.  This had a much smaller effect
on runtime, and the info produced was sufficient for my purposes.

Not that this changes your conclusion.  With or without that distortion I
completely believe that WALInsertLock is the bottleneck of parallel bulk
copy into unindexed tables.  I just can't find anything else it is a primary
bottleneck on.  I think the only real solution for bulk copy is to call
XLogInsert less often.  For example, it could build blocks in local memory,
then when done copy it into the shared buffers and then toss the entire
block into WAL in one call.  Easier said than implemented, of course.

Cheers,

Jeff


[HACKERS] Patch LWlocks instrumentation

2009-09-14 Thread Pierre Frédéric Caillau d


A little bit of a reply to Jeff's email about WALInsertLock.

This patch instruments LWLocks, it is controlled with the following  
#define's in lwlock.c :


LWLOCK_STATS
LWLOCK_TIMING_STATS

It is an upgrade of current lwlocks stats.

When active, at backend exit, it will display stats as shown below (here,  
we have a parallel COPY with 4 concurrent processes into the same table,  
on a 4 core machine).
If the (rather wide) sample output is mangled in your mail client, I've  
attached it as a separate text file.


 Lock stats for PID 22403
PIDLock  ShAcq ShWait ShWaitT ShHeldT   
ExAcq ExWait ExWaitTExHeldT Name
  22403   7  0  00.000.00 
252 730338   24.02 ( 53.49 %)7.25 ( 16.14 %)  WALInsert
  22403   8  0  00.000.00   
19501 733.48 (  7.75 %)0.40 (  0.88 %)  WALWrite


 Lock stats for PID 22404
PIDLock  ShAcq ShWait ShWaitT ShHeldT   
ExAcq ExWait ExWaitTExHeldT Name
  22404   7  0  00.000.00 
252 724683   23.34 ( 51.59 %)8.24 ( 18.20 %)  WALInsert
  22404   8  0  00.000.00   
19418 904.37 (  9.67 %)0.44 (  0.97 %)  WALWrite


 Lock stats for PID 22402
PIDLock  ShAcq ShWait ShWaitT ShHeldT   
ExAcq ExWait ExWaitTExHeldT Name
  22402   7  0  00.000.00 
252 735958   24.06 ( 52.73 %)8.05 ( 17.63 %)  WALInsert
  22402   8  0  00.000.00   
19154 974.21 (  9.22 %)0.39 (  0.85 %)  WALWrite


 Lock stats for PID 22400
PIDLock  ShAcq ShWait ShWaitT ShHeldT   
ExAcq ExWait ExWaitTExHeldT Name
  22400   7  0  00.000.00 
252 736265   25.50 ( 55.59 %)6.74 ( 14.70 %)  WALInsert
  22400   8  0  00.000.00   
19391 662.95 (  6.42 %)0.39 (  0.85 %)  WALWrite



Here we see that PID 22400 spent :

25.50 s waiting to get exclusive on WALInsert
6.74  s while holding exclusive on WALInsert

The percentages represent the fraction of time relative to the backend  
process' lifetime.
Here, I've exited the processes right after committing the transactions,  
but if you use psql and want accurate %, you'll need to exit quickly after  
the query to benchmark.


Here, for example, backends spend more than 50% of their time waiting on  
WALInsert...


lwlock_instrumentation.patch
Description: Binary data


lwlock_instrumentation.sample
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers