Re: [HACKERS] Why are we waiting?
Tom Lane wrote: Gregory Stark <[EMAIL PROTECTED]> writes: This is a tangent but are these actual Postgres processes? What's the logic behind trying to run a 1,000 processes on a box with 16 cpus? We should certainly be careful about trying to eliminate contention in this scenario at the cost of making things slower in more normal cases, but it seems interesting to stress the system just to see what happens. Was this with your patch to raise the size of the clog lru? That's an important question. What is MaxBackends actually set to for the runs. That I think is not. I'm fairly sure there are no performance-relevant paths in which cost is driven by MaxBackends rather than the actual current number of live backends. Certainly nothing in or around the ProcArray would act that way. regards, tom lane I guess I was not clear.. It was PostgreSQL 8.3.0 (with no source code change) I had compiled it 64-bit with DTRACE enabled. max-backend was set to 1500 But I dont think that causes any thing to work slow. But yes the connections are "pre-opened" in the sense when 500 users are actively doing work there are about 1006 postgresql processes running. Yes I think I am taking the database to the extreme. But generally there is some THINK time of 50ms involved so there are time slices available for other users. Yes Commercial DB can also do pretty well on such systems so its not unrealistic to expect that PostgreSQL cannot perform here. The old idea of stress testing it is to prove that it can go beyond these 16cores infact our target is about 64-cores soon. Regards, Jignesh ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] Why are we waiting?
Gregory Stark <[EMAIL PROTECTED]> writes: > This is a tangent but are these actual Postgres processes? What's the logic > behind trying to run a 1,000 processes on a box with 16 cpus? We should certainly be careful about trying to eliminate contention in this scenario at the cost of making things slower in more normal cases, but it seems interesting to stress the system just to see what happens. > Was this with your patch to raise the size of the clog lru? That's an important question. > What is MaxBackends actually set to for the runs. That I think is not. I'm fairly sure there are no performance-relevant paths in which cost is driven by MaxBackends rather than the actual current number of live backends. Certainly nothing in or around the ProcArray would act that way. regards, tom lane ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] Why are we waiting?
"Jignesh K. Shah" <[EMAIL PROTECTED]> writes: > for about 500users : > For about 700 Users: > At 1000 users This is a tangent but are these actual Postgres processes? What's the logic behind trying to run a 1,000 processes on a box with 16 cpus? They're all just going to be queuing up for i/o requests or sitting runnable waiting for a timeslice. Was this with your patch to raise the size of the clog lru? What is MaxBackends actually set to for the runs. Is it the same even when you're actually running with fewer backends? -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Get trained by Bruce Momjian - ask me about EnterpriseDB's PostgreSQL training! ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] Why are we waiting?
Last try for the script/results (truncating less significant portions of output which are too big) Staale Smedseng wrote: her locks should have been output correctly, however. But as Tom pointed out, the dynamic locks were not in the equation. So now we're measuring all lock waits instead of assuming. :-) Staale Here are some recent runs on 4-quadcore xeons using PostgreSQL 8.3.0 with 10-second runs Hope it gets through this time -Jignesh for about 500users : Lock IdMode Count 64 Shared 754 53 Exclusive 846 43 Exclusive1379 11 Shared2038 3 Exclusive2376 47 Exclusive2900 4 Shared3194 42 Exclusive3389 50 Exclusive3689 41 Exclusive5280 11 Exclusive6560 48 Exclusive 10837 7 Exclusive 20599 44 Exclusive 29193 46 Exclusive 30299 4 Exclusive 66303 Lock IdMode Combined Time (ns) 43 Exclusive 3440644758 47 Exclusive 6392480847 50 Exclusive 7199215298 42 Exclusive 7865126413 11 Exclusive 8503940437 4 Shared 9142071412 61 Shared 10613733376 41 Exclusive 14380504651 48 Exclusive 16631996177 7 Exclusive 17485716438 46 Exclusive 61584135053 44 Exclusive 68889040292 4 Exclusive 177458486936 For about 700 Users: Lock IdMode Count 3 Exclusive2873 4 Shared3436 47 Exclusive3555 11 Shared4098 50 Exclusive4392 42 Exclusive4573 11 Exclusive6209 41 Exclusive7552 48 Exclusive 12335 7 Exclusive 22469 44 Exclusive 36987 46 Exclusive 38467 4 Exclusive 81453 Lock IdMode Combined Time (ns) 7 Exclusive 12033761450 43 Exclusive 12217953092 50 Exclusive 15454797539 47 Exclusive 15684068953 42 Exclusive 19365161311 4 Shared 27038955376 48 Exclusive 32990318986 41 Exclusive 48350689916 46 Exclusive 193727946721 44 Exclusive 212025745038 4 Exclusive 713263386624 At 1000 users Lock IdMode Count 59 Shared 10475 58 Shared 11062 57 Shared 13726 7 Exclusive 18548 44 Exclusive 29714 46 Exclusive 33886 4 Exclusive 74773 11 Exclusive 79596 Lock IdMode Combined Time (ns) 43 Exclusive 9067359926 47 Exclusive 12259067669 50 Exclusive 13239372833 42 Exclusive 16534292830 7 Exclusive 23505490039 48 Exclusive 24991948402 41 Exclusive 33874749560 11 Shared 43963854482 4 Shared 64098606143 63 Shared 130988696365 64 Shared 137865936811 61 Shared 140978086498 59 Shared 161661023016 62 Shared 163857754020 58 Shared 16794620 44 Exclusive 220719773416 57 Shared 245170386594 46 Exclusive 3262
Re: [HACKERS] Why are we waiting?
On Thu, 2008-02-07 at 18:12, Simon Riggs wrote: > I just realised you are using a lookup to get the text for the name of > the lock. You used the same lookup table for both releases? Oh, it wasn't quite that bad. :-) The two DTrace scripts had been revised to correspond with the two different declarations of LWLockId in lwlock.h (for 8.2.5 and 8.3 respectively). But somehow the value for the last lock in our lookup table corresponding to FirstLockMgrLock = FirstBufMappingLock + NUM_BUFFER_PARTITIONS, in the enum got turned into n+1 instead of n+NUM_BUFFER_PARTITIONS. The other locks should have been output correctly, however. But as Tom pointed out, the dynamic locks were not in the equation. So now we're measuring all lock waits instead of assuming. :-) Staale ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] Why are we waiting?
I dont think my earlier message got through.. We use separate lookup tables for 825 and 83 based on the respective lwlock.h for that version. -Jignesh Simon Riggs wrote: On Thu, 2008-02-07 at 16:29 +0100, Staale Smedseng wrote: On Wed, 2008-02-06 at 19:55, Tom Lane wrote: I am wondering if the waits are being attributed to the right locks --- I remember such an error in a previous set of dtrace results, and some of the other details such as claiming shared lock delays but no exclusive lock delays for FirstLockMgrLock seem less than credible as well. Good catch. We've checked the DTrace scripts against the respective versions of lwlock.h, and the FirstLockMgrLock is off (this is actually the results for FirstBufMappingLock). I just realised you are using a lookup to get the text for the name of the lock. You used the same lookup table for both releases? ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [HACKERS] Why are we waiting?
On Thu, 2008-02-07 at 16:29 +0100, Staale Smedseng wrote: > On Wed, 2008-02-06 at 19:55, Tom Lane wrote: > > I am wondering if the waits are being > > attributed to the right locks --- I remember such an error in a previous > > set of dtrace results, and some of the other details such as claiming > > shared lock delays but no exclusive lock delays for FirstLockMgrLock > > seem less than credible as well. > > Good catch. We've checked the DTrace scripts against the respective > versions of lwlock.h, and the FirstLockMgrLock is off (this is actually > the results for FirstBufMappingLock). I just realised you are using a lookup to get the text for the name of the lock. You used the same lookup table for both releases? -- Simon Riggs 2ndQuadrant http://www.2ndQuadrant.com ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [HACKERS] Why are we waiting?
Staale Smedseng <[EMAIL PROTECTED]> writes: > Good catch. We've checked the DTrace scripts against the respective > versions of lwlock.h, and the FirstLockMgrLock is off (this is actually > the results for FirstBufMappingLock). > However, this is the last lock in the enum that we trace, the other > lower-numbered lock enums were correctly mapped. (In particular the > ProcArrayLock which we've been puzzled by.) Hmm, do you mean that you're just neglecting to collect any stats about all the dynamically assigned locks? That doesn't seem like it's going to offer a very good view of what's going on. I think the most useful stats would include aggregate wait data for all the lockmgr locks, all the bufmapping locks, all the buffer content locks, and all the buffer I/O locks. We'd like to think that contention for those would be low because of the partitioning ... but the point of measuring is to find out, not just hope. regards, tom lane ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [HACKERS] Why are we waiting?
On Wed, 2008-02-06 at 19:55, Tom Lane wrote: > I am wondering if the waits are being > attributed to the right locks --- I remember such an error in a previous > set of dtrace results, and some of the other details such as claiming > shared lock delays but no exclusive lock delays for FirstLockMgrLock > seem less than credible as well. Good catch. We've checked the DTrace scripts against the respective versions of lwlock.h, and the FirstLockMgrLock is off (this is actually the results for FirstBufMappingLock). However, this is the last lock in the enum that we trace, the other lower-numbered lock enums were correctly mapped. (In particular the ProcArrayLock which we've been puzzled by.) And also, there was no clog buffer twaking or pg_lock monitoring done when this benchmark was run, AFAIK. We'll redo the benchmarks and post new scripts and results. Staale ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] Why are we waiting?
Simon Riggs <[EMAIL PROTECTED]> writes: > On Wed, 2008-02-06 at 14:42 -0500, Tom Lane wrote: >> Not really, considering the extremely limited use of LW_SHARED in lock.c >> (GetLockConflicts is used only by CREATE INDEX CONCURRENTLY, and >> GetLockStatusData only by the pg_locks view). For the type of benchmark >> that I gather this is, there should be *zero* LW_SHARED acquisitions at >> all. And even if there are some, they could only be blocking against >> the (undoubtedly much more frequent) LW_EXCLUSIVE acquisitions; it's not >> very credible that there is zero contention among the LW_EXCLUSIVE locks >> yet a few shared acquirers manage to get burnt. > ...but the total wait time on those lock waits was 24 microseconds. I > hardly call that burnt. What you are failing to grasp is that the data is simply not credible (unless perhaps Staale fesses up that his benchmark includes a whole lot of pg_locks monitoring, in which case I'd want to see it redone without anyway). regards, tom lane ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] Why are we waiting?
On Wed, 2008-02-06 at 14:42 -0500, Tom Lane wrote: > Simon Riggs <[EMAIL PROTECTED]> writes: > > There were only 2 lock delays for FirstLockMgrLock in SHARED mode, so it > > seems believable that there were 0 lock delays in EXCLUSIVE mode. > > Not really, considering the extremely limited use of LW_SHARED in lock.c > (GetLockConflicts is used only by CREATE INDEX CONCURRENTLY, and > GetLockStatusData only by the pg_locks view). For the type of benchmark > that I gather this is, there should be *zero* LW_SHARED acquisitions at > all. And even if there are some, they could only be blocking against > the (undoubtedly much more frequent) LW_EXCLUSIVE acquisitions; it's not > very credible that there is zero contention among the LW_EXCLUSIVE locks > yet a few shared acquirers manage to get burnt. ...but the total wait time on those lock waits was 24 microseconds. I hardly call that burnt. -- Simon Riggs 2ndQuadrant http://www.2ndQuadrant.com ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] Why are we waiting?
Simon Riggs <[EMAIL PROTECTED]> writes: > There were only 2 lock delays for FirstLockMgrLock in SHARED mode, so it > seems believable that there were 0 lock delays in EXCLUSIVE mode. Not really, considering the extremely limited use of LW_SHARED in lock.c (GetLockConflicts is used only by CREATE INDEX CONCURRENTLY, and GetLockStatusData only by the pg_locks view). For the type of benchmark that I gather this is, there should be *zero* LW_SHARED acquisitions at all. And even if there are some, they could only be blocking against the (undoubtedly much more frequent) LW_EXCLUSIVE acquisitions; it's not very credible that there is zero contention among the LW_EXCLUSIVE locks yet a few shared acquirers manage to get burnt. regards, tom lane ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] Why are we waiting?
On Wed, 2008-02-06 at 13:55 -0500, Tom Lane wrote: > Gregory Stark <[EMAIL PROTECTED]> writes: > > "Staale Smedseng" <[EMAIL PROTECTED]> writes: > >> Also, an interesting observation is that the hot locks seem to have > >> changed from v8.2 to v8.3, making the ProcArrayLock more contended. See > >> the following outputs: > >> > >> PostgreSQL 8.2 (32-bit): > >> ... > >> PostgreSQL 8.3 (64-bit): > >> ... > > > I'm not sure 32-bit and 64-bit cases are going to be directly comparable. We > > could have a problem with cache line aliasing on only one or the other for > > example. > > Yeah, I find these numbers highly dubious. AFAIR we didn't do anything > that would have reduced CLogControlLock contention, and we definitely > did work to reduce ProcArrayLock contention, so the claimed results seem > directly opposite to expectation. I am wondering if the waits are being > attributed to the right locks --- I remember such an error in a previous > set of dtrace results, and some of the other details such as claiming > shared lock delays but no exclusive lock delays for FirstLockMgrLock > seem less than credible as well. There were only 2 lock delays for FirstLockMgrLock in SHARED mode, so it seems believable that there were 0 lock delays in EXCLUSIVE mode. I assumed that Staale is running with clog buffers tweaked? Can you confirm Staale? -- Simon Riggs 2ndQuadrant http://www.2ndQuadrant.com ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] Why are we waiting?
Gregory Stark <[EMAIL PROTECTED]> writes: > "Staale Smedseng" <[EMAIL PROTECTED]> writes: >> Also, an interesting observation is that the hot locks seem to have >> changed from v8.2 to v8.3, making the ProcArrayLock more contended. See >> the following outputs: >> >> PostgreSQL 8.2 (32-bit): >> ... >> PostgreSQL 8.3 (64-bit): >> ... > I'm not sure 32-bit and 64-bit cases are going to be directly comparable. We > could have a problem with cache line aliasing on only one or the other for > example. Yeah, I find these numbers highly dubious. AFAIR we didn't do anything that would have reduced CLogControlLock contention, and we definitely did work to reduce ProcArrayLock contention, so the claimed results seem directly opposite to expectation. I am wondering if the waits are being attributed to the right locks --- I remember such an error in a previous set of dtrace results, and some of the other details such as claiming shared lock delays but no exclusive lock delays for FirstLockMgrLock seem less than credible as well. regards, tom lane ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] Why are we waiting?
I'm not sure 32-bit and 64-bit cases are going to be directly comparable. We could have a problem with cache line aliasing on only one or the other for example. Agreed, this is likely comparing apples and oranges. I'll see if I can get a one-to-one comparison done (these were the numbers I had close by when writing the email). But that is a pretty striking difference. Does the 8.3 run complete more transactions in that time? I'll make sure to include those numbers as well. :) Staale ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] Why are we waiting?
On Wed, 2008-02-06 at 18:44 +0100, Staale Smedseng wrote: > > What is the frequency distribution of lock wait time on ProcArrayLock? > > See below for wait time distributions for ProcArrayLock (both shared and > exclusive). The time measured is from entry into LWLockAcquire() until > return. I've recorded the same data in two different resolutions (ms, > and us for the lower part of the distribution). The DTrace script is at > the bottom. > > These results are for 1000 TPC-C like clients, and measured over the > 1000 PostgreSQL processes over a period of 10 seconds. Thanks! Interesting resonance pattern. -- Simon Riggs 2ndQuadrant http://www.2ndQuadrant.com ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] Why are we waiting?
> What is the frequency distribution of lock wait time on ProcArrayLock? See below for wait time distributions for ProcArrayLock (both shared and exclusive). The time measured is from entry into LWLockAcquire() until return. I've recorded the same data in two different resolutions (ms, and us for the lower part of the distribution). The DTrace script is at the bottom. These results are for 1000 TPC-C like clients, and measured over the 1000 PostgreSQL processes over a period of 10 seconds. > Does the distribution vary over time? Hmm... I will have to get back to you on that one. Staale CPU IDFUNCTION:NAME 6 71245 :tick-10sec Total LW_EXCLUSIVE25178 Total Transaction Starts 25679 Total LW_SHARED 107211 LW_SHARED [ms] value - Distribution - count < 0 | 0 0 |@@ 100565 10 | 1 20 | 0 30 | 0 40 | 0 50 | 0 60 | 0 70 | 6 80 | 304 90 |@1370 100 |@2685 110 |@1731 120 | 307 130 | 13 140 | 0 150 | 0 160 | 5 170 | 4 180 | 26 190 | 36 200 | 24 210 | 61 220 | 49 230 | 15 240 | 0 250 | 0 260 | 0 270 | 0 280 | 2 290 | 2 300 | 0 310 | 2 320 | 1 330 | 1 340 | 0 350 | 0 360 | 0 370 | 0 380 | 0 390 | 0 400 | 1 410 | 0 LW_EXCLUSIVE [ms] value - Distribution - count < 0 | 0 0 |@@ 1565 10 | 0 20 | 0 30 | 0 40 | 0 50 | 0 60 | 0 70 | 16 80 |@894 90 |@@@ 4108 100 |@8090 110 | 4
Re: [HACKERS] Why are we waiting?
"Staale Smedseng" <[EMAIL PROTECTED]> writes: > The stack trace shows that the only time the lock is acquired > exclusively is from the call to ProcArrayEndTransaction() in > CommitTransaction(). I'm not sure but I think that's only true in 8.3. As I understood it in 8.2 transaction start also needed the exclusive lock. > Also, an interesting observation is that the hot locks seem to have > changed from v8.2 to v8.3, making the ProcArrayLock more contended. See > the following outputs: > > PostgreSQL 8.2 (32-bit): >... > PostgreSQL 8.3 (64-bit): >... I'm not sure 32-bit and 64-bit cases are going to be directly comparable. We could have a problem with cache line aliasing on only one or the other for example. But that is a pretty striking difference. Does the 8.3 run complete more transactions in that time? -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's 24x7 Postgres support! ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] Why are we waiting?
On Wed, 2008-02-06 at 15:30 +0100, Staale Smedseng wrote: > On Mon, 2008-02-04 at 19:46, Simon Riggs wrote: > > We've got various facilities now for looking at LWLock waits, but I'd > > like to have more information about the *reasons* for lock waits. > > > > I know its possible to get backtraces in Dtrace at various tracepoints > > but that can be fairly hard to interpret. Thanks for this Staale. > Lock IdMode Count >ProcArrayLock Shared1821 >ProcArrayLock Exclusive 49762 > > Lock IdMode Combined Time (ns) >ProcArrayLock Shared 31114676303 >ProcArrayLock Exclusive 888356047549 Which looks like a mean service time of 17ms for X lock waiters. What is the frequency distribution of lock wait time on ProcArrayLock? Does the distribution vary over time? -- Simon Riggs 2ndQuadrant http://www.2ndQuadrant.com ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] Why are we waiting?
On Mon, 2008-02-04 at 19:46, Simon Riggs wrote: > We've got various facilities now for looking at LWLock waits, but I'd > like to have more information about the *reasons* for lock waits. > > I know its possible to get backtraces in Dtrace at various tracepoints > but that can be fairly hard to interpret. Simon, A couple of guys here at Sun have started looking at trying to improve the scaling on Sun's T2000 Niagara servers (single core, 32 HW-threads). We mostly use various DTrace scripts to monitor locking usage. We'd be happy to share results, probes and DTrace scripts for monitoring if you like. So far we've found lock contention (especially for the ProcArrayLock) to be the likely source for our inability to saturate the CPU with a TPC-C like OLTP load with >1000 users. The following shows average time spent in and waiting for exclusive ProcArrayLock vs the time used in the transaction through commit, (i.e., up until the exclusive ProcArrayLock acquire to update the PGPROC setting transaction no longer running): # ./pg-lwlock-procarray.d $(pgrep -n postgres) postgres`LWLockAcquire postgres`ProcArrayEndTransaction+0x10 postgres`CommitTransaction+0xf0 postgres`CommitTransactionCommand+0x90 postgres`finish_xact_command+0x60 postgres`exec_execute_message+0x3b4 postgres`PostgresMain+0x13a0 postgres`BackendRun+0x27c postgres`BackendStartup+0xe0 postgres`ServerLoop+0x1a0 postgres`PostmasterMain+0xcbc postgres`main+0x1d8 postgres`_start+0x108 23 avg lwlock acquire service time [ns] 193051989 transaction-commit [count] 23 transaction-start [count] 23 avg transaction time [ns] 12763079 The stack trace shows that the only time the lock is acquired exclusively is from the call to ProcArrayEndTransaction() in CommitTransaction(). Also, an interesting observation is that the hot locks seem to have changed from v8.2 to v8.3, making the ProcArrayLock more contended. See the following outputs: PostgreSQL 8.2 (32-bit): -bash-3.00# ./825_lwlock_wait.d Lock IdMode Count FirstLockMgrLock Shared 2 ProcArrayLock Shared 209 XidGenLock Exclusive1030 XidGenLock Shared1215 WALInsertLock Exclusive3942 CLogControlLock Shared4113 ProcArrayLock Exclusive6929 WALWriteLock Exclusive 17155 CLogControlLock Exclusive 128182 Lock Id Combined Time (ns) FirstLockMgrLock24705 WALInsertLock 79644210 XidGenLock179886846 ProcArrayLock 1486950738 WALWriteLock 18425400504 CLogControlLock1507388036453 PostgreSQL 8.3 (64-bit): -bash-3.00# ./83_lwlock_wait.d Lock IdMode Count SInvalLock Exclusive 1 WALWriteLock Exclusive 1 SInvalLock Shared 20 CLogControlLock Shared 534 CLogControlLock Exclusive 845 XidGenLock Exclusive1140 ProcArrayLock Shared1821 WALInsertLock Exclusive 17043 ProcArrayLock Exclusive 49762 Lock IdMode Combined Time (ns) SInvalLock Exclusive17216 SInvalLock Shared 531129 WALWriteLock Exclusive 2003064 CLogControlLock Shared 61325819 CLogControlLock Exclusive 73509195 XidGenLock Exclusive929477929 WALInsertLock Exclusive 17941476941 ProcArrayLock Shared 31114676303 ProcArrayLock Exclusive 888356047549 Regards, Staale Smedseng Database Technology Group, Sun Microsystems ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] Why are we waiting?
Simon Riggs <[EMAIL PROTECTED]> writes: > On Mon, 2008-02-04 at 17:06 -0500, Tom Lane wrote: >> Basically I'd rather try to attack the problem with dtrace ... > OK. I'll switch to Solaris. Or do you something I don't about dtrace on > linux? Nope :-(. The SystemTap guys keep promising support for userspace probes but it's not there yet. regards, tom lane ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] Why are we waiting?
On Tue, 2008-02-05 at 14:14 +, Heikki Linnakangas wrote: > Simon Riggs wrote: > > On Mon, 2008-02-04 at 17:06 -0500, Tom Lane wrote: > > > >> Basically I'd rather try to attack the problem with dtrace ... > > > > OK. I'll switch to Solaris. Or do you something I don't about dtrace on > > linux? > > One idea would be to add new arguments to LWLockAcquire as you suggest, > but instead of modifying all call sites, decorate it with a macro that > passes __FILE__ and __LINE__ as the extra arguments. The good thing > about that is that it's a relatively small patch and you can easily > switch it on/off with a #ifdef. And there's no need to push for > inclusion of that into CVS, because it would be an easy patch to > maintain yourself. Thanks for the idea. It had occurred to me to make a private patch, but I prefer my patches to be open, so they're easier to discuss results arising from them. -- Simon Riggs 2ndQuadrant http://www.2ndQuadrant.com ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] Why are we waiting?
Simon Riggs wrote: On Mon, 2008-02-04 at 17:06 -0500, Tom Lane wrote: Basically I'd rather try to attack the problem with dtrace ... OK. I'll switch to Solaris. Or do you something I don't about dtrace on linux? One idea would be to add new arguments to LWLockAcquire as you suggest, but instead of modifying all call sites, decorate it with a macro that passes __FILE__ and __LINE__ as the extra arguments. The good thing about that is that it's a relatively small patch and you can easily switch it on/off with a #ifdef. And there's no need to push for inclusion of that into CVS, because it would be an easy patch to maintain yourself. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [HACKERS] Why are we waiting?
On Mon, 2008-02-04 at 17:06 -0500, Tom Lane wrote: > Basically I'd rather try to attack the problem with dtrace ... OK. I'll switch to Solaris. Or do you something I don't about dtrace on linux? -- Simon Riggs 2ndQuadrant http://www.2ndQuadrant.com ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] Why are we waiting?
Simon Riggs <[EMAIL PROTECTED]> writes: > I'm thinking of adding an extra parameter onto every call to > LockBuffer() and LWLockAcquire() to explain the reason for the lock > request. Maybe I'm missing something, but I don't see what this would buy us, except for being able to track which call site resulted in a wait; which can already be measured with dtrace, no? I'm hesitant to decorate such widely-used functions with extra tracing information. You'd be breaking every third-party module and pending patch that uses either function, and imposing some performance penalty (probably not much, but it's hard to be sure) into the indefinite future, for a performance measurement need that might be fleeting. Basically I'd rather try to attack the problem with dtrace ... regards, tom lane ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match