Re: [HACKERS] The BUFFER_HIT and BUFFER_MISS probes seem pretty darn redundant
Tom Lane wrote: Whilst cleaning up the dtrace probe documentation, I realized that there is nothing the aforementioned probes tell you that you can't find out (with a lot more data besides) from the final argument of the BUFFER_READ_DONE probe. Furthermore, tallying them as-is would be misleading since you couldn't distinguish true reads from relation extension requests. I think we should just remove them. Agreed. The found flag from BUFFER_READ_DONE will serve the same purpose. As always, thanks for offering great solutions! -Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Broken stuff in new dtrace probes
Tom Lane wrote: Greg Stark writes: i like the idea of just have a separate pair of probes for table extension. I bet there are people who would actually like to see that alone sometimes too. After further thought I concluded that the best solution for this is to add the isExtend flag to the buffer_read_start/read_done probe parameter lists. This allows the dtrace script writer to make the distinction if he chooses, without adding any extra overhead for normal non-traced operation. AFAICS using a separate probe type would add at least a couple of if-tests even with tracing turned off. I like this solution. From my perspective, it's always better to give the script writer the flexibility to pick and choose the data s/she wants to see and at the same time avoid adding new probes unnecessarily. -Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] DTrace probes broken in HEAD on Solaris?
Tom Lane wrote: Zdenek Kotala writes: Answer why it happens when probes are disabled is, that for user application there are piece of code which prepare DTrace probes arguments which will be passed into kernel DTrace modul. This code has performance penalty which depends on number of arguments. The other thing I don't like is that this implementation exposes people to any bugs that may exist in the probe arguments, *even when they don't have any tracing turned on*. (Again, we had two different instances of that last week, so don't bother arguing that it doesn't happen.) Yes, the above scenario can occur when compiling with DTrace (passing --enable-dtrace to configure) even when the probes are not enabled. It won't be an issue if you don't compile with DTrace though as the macros are converted to no-ops. Hopefully, any bugs in the probe arguments would be caught during testing ;-) Both of these considerations are strong arguments for not building production installations with --enable-dtrace, just as we don't encourage people to build for production with --enable-cassert. But of course part of the argument for dtrace support is that people would like to have such probing available in production installations. What I've found out about this is that for each probe macro, DTrace also defines a foo_ENABLED() macro that can be used like this: if (foo_ENABLED()) foo(...); I think what we should do about these considerations is fix our macro definitions so that the if(ENABLED()) test is built into the macros. I'm not sure what this will require ... probably some post-processing of the probes.h file ... but if we don't do it we're going to keep getting bit. I was contemplating on using the is-enabled test, but when checking the arguments to all the probes, they were quite simple (except the relpath() call which is now removed). I think the is-enabled test will address the issues you encountered. I see a few alternative to fixing this: 1) Only use if (foo_ENABLED()) test for probes with expensive function call/computation in arguments. This will keep the code clean, and we can document this in the "Definine New Probes" section in the online doc. 2) Add the if(foo_ENABLED()) test to all probes manually and make this a requirement for all future probes. This makes the check explicit and avoid confusion. 3) Post-process probes.h so if(foo_ENABLED()) test is added to every probe. We're doing some post-processing now by pre-pending TRACE_ to the macros with a sed script. Personally, I don't like doing complex post-processing of output from another tool because the script can break if for some reason DTrace's output is changed. I prefer option 1 the most and 3 the least. -Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Broken stuff in new dtrace probes
Just returned from vacation ... Tom Lane wrote: I notice that we have in md.c TRACE_POSTGRESQL_SMGR_MD_READ_DONE(forknum, blocknum, reln->smgr_rnode.spcNode, reln->smgr_rnode.dbNode, reln->smgr_rnode.relNode, relpath(reln->smgr_rnode, forknum), nbytes, BLCKSZ); TRACE_POSTGRESQL_SMGR_MD_WRITE_DONE(forknum, blocknum, reln->smgr_rnode.spcNode, reln->smgr_rnode.dbNode, reln->smgr_rnode.relNode, relpath(reln->smgr_rnode, forknum), nbytes, BLCKSZ); relpath() returns a palloc'd string, which will not get freed, which means that any serious use of these probe points will shortly blow out the backend's memory. So far as I can see the path argument is redundant with the other probe arguments and we might as well just remove it --- objections? agreed; the other arguments can be used to get the same result. There's another problem in tuplesort.c: TRACE_POSTGRESQL_SORT_DONE(state->tapeset, (state->tapeset ? LogicalTapeSetBlocks(state->tapeset) : (state->allowedMem - state->availMem + 1023) / 1024)); This is called after state->tapeset has been freed, which means that the LogicalTapeSetBlocks call will very likely give a wrong answer, especially in assert-enabled builds but maybe even in a regular one. Good catch and thanks for reviewing the probes. -Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Additional DTrace Probes
ITAGAKI Takahiro wrote: This will introduce SLRU and Executor probes. We already have Lock, LWLock, Buffer, I/O and XLogs probes. I'd like to have the following probes, too. In my experience, they could be bottlenecks or consume much time in some situations. - idle in transaction - spinlock wait-and-retry - CPU: Trigger execution - CPU: Encoding conversion - Network: send() and recv() - smgr: lseek() calls - Tempfile reads and writes Great suggestions. If you have particular use cases in mind, please send them to the list. This will help determine what kind of data to be made available via the probes. It'll also be helpful if you know the locations for the probes. -Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Additional DTrace Probes
Zdenek Kotala wrote: I tested your patch and it looks good. however I have several comments/questions: 1) probes contains pointer but in probe.h it is declared as int. Is it correct? The probes cast the pointer to uintptr_t, so the correct type that will work for both ILP32 and LP64 models is unsigned long. I've made the correction from unsigned int to unsigned long. The reason uintptr_t is not used in the probe definition is because it causes compilation problem on Mac OS X. This is a known problem in the OS X's DTrace implementation. 2) Maybe TRACE_POSTGRESQL_SLRU_READPAGE_PHYSICAL_DONE(true, -1, -1); would be better. Because slru_errcause, slru_errno can contains garbage in situation when everything goes fine. Same for write. I've made the changes per your suggestion although one can argue that the script can check arg0, and if it's true, avoid using arg1 and arg2 as they are meaningless. I think it is committable for 8.4. That would be awesome! -Robert Index: src/backend/access/transam/slru.c === RCS file: /projects/cvsroot/pgsql/src/backend/access/transam/slru.c,v retrieving revision 1.45 diff -u -3 -p -r1.45 slru.c --- src/backend/access/transam/slru.c 1 Jan 2009 17:23:36 - 1.45 +++ src/backend/access/transam/slru.c 8 Mar 2009 20:39:01 - @@ -57,6 +57,7 @@ #include "storage/fd.h" #include "storage/shmem.h" #include "miscadmin.h" +#include "pg_trace.h" /* @@ -372,6 +373,7 @@ SimpleLruReadPage(SlruCtl ctl, int pagen { SlruShared shared = ctl->shared; + TRACE_POSTGRESQL_SLRU_READPAGE_START((uintptr_t)ctl, pageno, write_ok, xid); /* Outer loop handles restart if we must wait for someone else's I/O */ for (;;) { @@ -399,6 +401,7 @@ SimpleLruReadPage(SlruCtl ctl, int pagen } /* Otherwise, it's ready to use */ SlruRecentlyUsed(shared, slotno); + TRACE_POSTGRESQL_SLRU_READPAGE_DONE(slotno); return slotno; } @@ -446,6 +449,7 @@ SimpleLruReadPage(SlruCtl ctl, int pagen SlruReportIOError(ctl, pageno, xid); SlruRecentlyUsed(shared, slotno); + TRACE_POSTGRESQL_SLRU_READPAGE_DONE(slotno); return slotno; } } @@ -470,6 +474,8 @@ SimpleLruReadPage_ReadOnly(SlruCtl ctl, SlruShared shared = ctl->shared; int slotno; + TRACE_POSTGRESQL_SLRU_READPAGE_READONLY((uintptr_t)ctl, pageno, xid); + /* Try to find the page while holding only shared lock */ LWLockAcquire(shared->ControlLock, LW_SHARED); @@ -511,6 +517,8 @@ SimpleLruWritePage(SlruCtl ctl, int slot int pageno = shared->page_number[slotno]; boolok; + TRACE_POSTGRESQL_SLRU_WRITEPAGE_START((uintptr_t)ctl, pageno, slotno); + /* If a write is in progress, wait for it to finish */ while (shared->page_status[slotno] == SLRU_PAGE_WRITE_IN_PROGRESS && shared->page_number[slotno] == pageno) @@ -525,7 +533,10 @@ SimpleLruWritePage(SlruCtl ctl, int slot if (!shared->page_dirty[slotno] || shared->page_status[slotno] != SLRU_PAGE_VALID || shared->page_number[slotno] != pageno) + { + TRACE_POSTGRESQL_SLRU_WRITEPAGE_DONE(); return; + } /* * Mark the slot write-busy, and clear the dirtybit. After this point, a @@ -569,6 +580,8 @@ SimpleLruWritePage(SlruCtl ctl, int slot /* Now it's okay to ereport if we failed */ if (!ok) SlruReportIOError(ctl, pageno, InvalidTransactionId); + + TRACE_POSTGRESQL_SLRU_WRITEPAGE_DONE(); } /* @@ -593,6 +606,8 @@ SlruPhysicalReadPage(SlruCtl ctl, int pa SlruFileName(ctl, path, segno); + TRACE_POSTGRESQL_SLRU_READPAGE_PHYSICAL_START((uintptr_t)ctl, path, pageno, slotno); + /* * In a crash-and-restart situation, it's possible for us to receive * commands to set the commit status of transactions whose bits are in @@ -607,6 +622,7 @@ SlruPhysicalReadPage(SlruCtl ctl, int pa { slru_errcause = SLRU_OPEN_FAILED; slru_errno = errno; + TRACE_POSTGRESQL_SLRU_READPAGE_PHYSICAL_DONE(false, slru_errcause, slru_errno); return false; } @@ -614,6 +630,7 @@ SlruPhysicalReadPage(SlruCtl ctl, int pa (errmsg("file \"%s\" doesn't exist, reading as zeroes", path))); MemSet(shared->page_buffer[slotno], 0, BLCKSZ); + TRACE_POSTGRESQL_SLRU_READPAGE_PHYSICAL_DONE(true, -1, -1); return true; } @@ -
[HACKERS] Additional DTrace Probes
The attached patch contains the probes (originally came from Theo Schlossnagle) that were removed from the previous submitted patch. Zdenek had some concerns about the way the probes were implemented http://archives.postgresql.org/pgsql-hackers/2008-07/msg01168.php. If there are specific recommendations, I'd be more than happy to make the changes and get them resubmitted. -Robert Index: src/backend/access/transam/slru.c === RCS file: /projects/cvsroot/pgsql/src/backend/access/transam/slru.c,v retrieving revision 1.45 diff -u -3 -p -r1.45 slru.c --- src/backend/access/transam/slru.c 1 Jan 2009 17:23:36 - 1.45 +++ src/backend/access/transam/slru.c 6 Mar 2009 04:01:56 - @@ -57,6 +57,7 @@ #include "storage/fd.h" #include "storage/shmem.h" #include "miscadmin.h" +#include "pg_trace.h" /* @@ -372,6 +373,7 @@ SimpleLruReadPage(SlruCtl ctl, int pagen { SlruShared shared = ctl->shared; + TRACE_POSTGRESQL_SLRU_READPAGE_START((uintptr_t)ctl, pageno, write_ok, xid); /* Outer loop handles restart if we must wait for someone else's I/O */ for (;;) { @@ -399,6 +401,7 @@ SimpleLruReadPage(SlruCtl ctl, int pagen } /* Otherwise, it's ready to use */ SlruRecentlyUsed(shared, slotno); + TRACE_POSTGRESQL_SLRU_READPAGE_DONE(slotno); return slotno; } @@ -446,6 +449,7 @@ SimpleLruReadPage(SlruCtl ctl, int pagen SlruReportIOError(ctl, pageno, xid); SlruRecentlyUsed(shared, slotno); + TRACE_POSTGRESQL_SLRU_READPAGE_DONE(slotno); return slotno; } } @@ -470,6 +474,8 @@ SimpleLruReadPage_ReadOnly(SlruCtl ctl, SlruShared shared = ctl->shared; int slotno; + TRACE_POSTGRESQL_SLRU_READPAGE_READONLY((uintptr_t)ctl, pageno, xid); + /* Try to find the page while holding only shared lock */ LWLockAcquire(shared->ControlLock, LW_SHARED); @@ -511,6 +517,8 @@ SimpleLruWritePage(SlruCtl ctl, int slot int pageno = shared->page_number[slotno]; boolok; + TRACE_POSTGRESQL_SLRU_WRITEPAGE_START((uintptr_t)ctl, pageno, slotno); + /* If a write is in progress, wait for it to finish */ while (shared->page_status[slotno] == SLRU_PAGE_WRITE_IN_PROGRESS && shared->page_number[slotno] == pageno) @@ -525,7 +533,10 @@ SimpleLruWritePage(SlruCtl ctl, int slot if (!shared->page_dirty[slotno] || shared->page_status[slotno] != SLRU_PAGE_VALID || shared->page_number[slotno] != pageno) + { + TRACE_POSTGRESQL_SLRU_WRITEPAGE_DONE(); return; + } /* * Mark the slot write-busy, and clear the dirtybit. After this point, a @@ -569,6 +580,8 @@ SimpleLruWritePage(SlruCtl ctl, int slot /* Now it's okay to ereport if we failed */ if (!ok) SlruReportIOError(ctl, pageno, InvalidTransactionId); + + TRACE_POSTGRESQL_SLRU_WRITEPAGE_DONE(); } /* @@ -593,6 +606,8 @@ SlruPhysicalReadPage(SlruCtl ctl, int pa SlruFileName(ctl, path, segno); + TRACE_POSTGRESQL_SLRU_READPAGE_PHYSICAL_START((uintptr_t)ctl, path, pageno, slotno); + /* * In a crash-and-restart situation, it's possible for us to receive * commands to set the commit status of transactions whose bits are in @@ -607,6 +622,7 @@ SlruPhysicalReadPage(SlruCtl ctl, int pa { slru_errcause = SLRU_OPEN_FAILED; slru_errno = errno; + TRACE_POSTGRESQL_SLRU_READPAGE_PHYSICAL_DONE(false, slru_errcause, slru_errno); return false; } @@ -614,6 +630,7 @@ SlruPhysicalReadPage(SlruCtl ctl, int pa (errmsg("file \"%s\" doesn't exist, reading as zeroes", path))); MemSet(shared->page_buffer[slotno], 0, BLCKSZ); + TRACE_POSTGRESQL_SLRU_READPAGE_PHYSICAL_DONE(true, slru_errcause, slru_errno); return true; } @@ -622,6 +639,7 @@ SlruPhysicalReadPage(SlruCtl ctl, int pa slru_errcause = SLRU_SEEK_FAILED; slru_errno = errno; close(fd); + TRACE_POSTGRESQL_SLRU_READPAGE_PHYSICAL_DONE(false, slru_errcause, slru_errno); return false; } @@ -631,6 +649,7 @@ SlruPhysicalReadPage(SlruCtl ctl, int pa slru_errcause = SLRU_READ_FAILED; slru_errno = errno; close(fd); + TRACE_POSTGRESQL_SLRU_READPAGE_PHYSICAL_DONE(false, slru_errcause, slru_errno);
[HACKERS] DTrace doc patch for new probes in 8.4
Attached is the doc patch for the recently added probes. -Robert Index: doc/src/sgml/monitoring.sgml === RCS file: /projects/cvsroot/pgsql/doc/src/sgml/monitoring.sgml,v retrieving revision 1.63 diff -u -3 -p -r1.63 monitoring.sgml --- doc/src/sgml/monitoring.sgml11 Nov 2008 20:06:21 - 1.63 +++ doc/src/sgml/monitoring.sgml26 Feb 2009 22:18:31 - @@ -1009,23 +1009,25 @@ SELECT pg_stat_get_backend_pid(s.backend PostgreSQL provides facilities to support dynamic tracing of the database server. This allows an external utility to be called at specific points in the code and thereby trace - execution. Currently, this facility is primarily intended for use by - database developers, as it requires substantial familiarity with the code. + execution. - A number of probes or trace points are already inserted - into the source code. By default these probes are not compiled into the + A number of probes or trace points are already inserted into the source + code. These probes are intended to be used by database developers and + administrators. By default the probes are not compiled into the binary, and the user needs to explicitly tell the configure script to make the probes available in PostgreSQL. - Currently, only the DTrace utility is supported, which is available - on Solaris Express, Solaris 10, and Mac OS X Leopard. It is expected that - DTrace will be available in the future on FreeBSD. - Supporting other dynamic tracing utilities is theoretically possible by - changing the definitions for the macros in + Currently, only the + http://opensolaris.org/os/community/dtrace/";>DTrace + utility is supported, which is available + on OpenSolaris, Solaris 10, and Mac OS X Leopard. It is expected that + DTrace will be available in the future on FreeBSD and possibly other + operating systems. Supporting other dynamic tracing utilities is + theoretically possible by changing the definitions for the macros in src/include/utils/probes.h. @@ -1045,93 +1047,387 @@ SELECT pg_stat_get_backend_pid(s.backend Built-in Probes - A few standard probes are provided in the source code - (of course, more can be added as needed for a particular problem). - These are shown in . + A number of standard probes are provided in the source code, and more + can certainly be added to enhance PostgreSQL's observability. There are two categories + of probes, those that are targeted toward database administrators and those for developers. + They are shown in and + . - - Built-in Probes + + Built-in Probes for Administrators Name Parameters - Overview + Description + transaction-start - (int transactionId) - The start of a new transaction. + (LocalTransactionId) + Probe that fires at the start of a new transaction. arg0 is the transaction id. transaction-commit - (int transactionId) - The successful completion of a transaction. + (LocalTransactionId) + Probe that fires when a transaction completes successfully. arg0 is the transaction id. transaction-abort - (int transactionId) - The unsuccessful completion of a transaction. + (LocalTransactionId) + Probes that fires when a transaction does not complete successfully. arg0 is the transaction id. + + + query-start + (const char *) + Probe that fires when the execution of a statement is started. arg0 is the query string. + + + query-done + (const char *) + Probe that fires when the execution of a statement is complete. arg0 is the query string. + + + query-parse-start + (const char *) + Probe that fires when the parsing of a query is started. arg0 is the query string. + + + query-parse-done + (const char *) + Probe that fires when the parsing of a query is complete. arg0 is the query string. + + + query-rewrite-start + (const char *) + Probe that fires when the rewriting of a query is started. arg0 is the query string. + + + query-rewrite-done + (const char *) + Probe that fires when the rewriting of a query is complete. arg0 is the query string. + + + query-plan-start + () + Probe that fires when the planning of a query is started. + + + query-plan-done + () + Probe that fires when the planning of a query is complete. + + + query-execute-start + () + Probe that fires when the execution of a query is started. + + + query-execute-done + () + Probe that fires when the execution of a query is complete. + + + statement-status + (const char *) + Probe that fires anytime an SQL statement is executed on the server.
Re: [HACKERS] DTrace probes patch
Tom Lane wrote: Robert Lor writes: Tom Lane wrote: I agree. If the probe is meant to track only *some* WAL writes then it needs to be named something less generic than TRACE_POSTGRESQL_WAL_BUFFER_WRITE. How about change it to TRACE_POSTGRESQL_WAL_BUFFER_WRITE_DIRTY similar to TRACE_POSTGRESQL_BUFFER_WRITE_DIRTY for shared buffers? Works for me... Attached is the patch for the above name change. -Robert Index: src/backend/access/transam/xlog.c === RCS file: /projects/cvsroot/pgsql/src/backend/access/transam/xlog.c,v retrieving revision 1.324 diff -u -3 -p -r1.324 xlog.c --- src/backend/access/transam/xlog.c 17 Dec 2008 01:39:03 - 1.324 +++ src/backend/access/transam/xlog.c 22 Dec 2008 16:28:00 - @@ -1318,14 +1318,14 @@ AdvanceXLInsertBuffer(bool new_segment) * Have to write buffers while holding insert lock. This is * not good, so only write as much as we absolutely must. */ - TRACE_POSTGRESQL_WAL_BUFFER_WRITE_START(); + TRACE_POSTGRESQL_WAL_BUFFER_WRITE_DIRTY_START(); WriteRqst.Write = OldPageRqstPtr; WriteRqst.Flush.xlogid = 0; WriteRqst.Flush.xrecoff = 0; XLogWrite(WriteRqst, false, false); LWLockRelease(WALWriteLock); Insert->LogwrtResult = LogwrtResult; - TRACE_POSTGRESQL_WAL_BUFFER_WRITE_DONE(); + TRACE_POSTGRESQL_WAL_BUFFER_WRITE_DIRTY_DONE(); } } } Index: src/backend/utils/probes.d === RCS file: /projects/cvsroot/pgsql/src/backend/utils/probes.d,v retrieving revision 1.4 diff -u -3 -p -r1.4 probes.d --- src/backend/utils/probes.d 17 Dec 2008 01:39:04 - 1.4 +++ src/backend/utils/probes.d 22 Dec 2008 16:28:01 - @@ -89,6 +89,6 @@ provider postgresql { probe xlog__insert(unsigned char, unsigned char); probe xlog__switch(); - probe wal__buffer__write__start(); - probe wal__buffer__write__done(); + probe wal__buffer__write__dirty__start(); + probe wal__buffer__write__dirty__done(); }; -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] DTrace probes patch
Tom Lane wrote: "Fujii Masao" writes: I understood your intention. But, I think that its function name is somewhat confusing. I agree. If the probe is meant to track only *some* WAL writes then it needs to be named something less generic than TRACE_POSTGRESQL_WAL_BUFFER_WRITE. How about change it to TRACE_POSTGRESQL_WAL_BUFFER_WRITE_DIRTY similar to TRACE_POSTGRESQL_BUFFER_WRITE_DIRTY for shared buffers? -Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] DTrace probes patch
Alvaro Herrera wrote: But there are 5 callers of XLogWrite ... why aren't the other ones being tracked too? This probe originally came from Simon, so it can't possibly be wrong :-) My understanding is that we only want to track the XLogWrite when advancing to the next buffer page, and if there is unwritten data in the new buffer page, that indicates no more empty WAL buffer pages available, but I may be wrong. I did some tests by adjusting wal_buffers, and I could observe this behavior, more calls to XLogWrite with smaller wal_buffers. -Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] DTrace probes patch
Fujii Masao wrote: Hi, On Wed, Dec 17, 2008 at 4:53 AM, Robert Lor wrote: @@ -1313,12 +1318,14 @@ AdvanceXLInsertBuffer(bool new_segment) * Have to write buffers while holding insert lock. This is * not good, so only write as much as we absolutely must. */ + TRACE_POSTGRESQL_WAL_BUFFER_WRITE_START(); WriteRqst.Write = OldPageRqstPtr; WriteRqst.Flush.xlogid = 0; WriteRqst.Flush.xrecoff = 0; XLogWrite(WriteRqst, false, false); LWLockRelease(WALWriteLock); Insert->LogwrtResult = LogwrtResult; + TRACE_POSTGRESQL_WAL_BUFFER_WRITE_DONE(); Why is TRACE_POSTGRESQL_WAL_BUFFER_WRITE_START/DONE called only in AdvanceXLInsertBuffer? We can trace only a part of WAL buffer write? The intention of these probes is to determine if wal_buffers is too small by monitoring how frequent the server has to write out the buffers along with the I/O time. -Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] DTrace probes patch
Bruce Momjian wrote: I am seeing the following error when linking the backend on a BSD machine: The updated patch attached should fix this problem. My bad for overlooking this. -Robert Index: src/backend/access/transam/xlog.c === RCS file: /projects/cvsroot/pgsql/src/backend/access/transam/xlog.c,v retrieving revision 1.322 diff -u -3 -p -r1.322 xlog.c --- src/backend/access/transam/xlog.c 9 Nov 2008 17:51:15 - 1.322 +++ src/backend/access/transam/xlog.c 16 Dec 2008 19:46:08 - @@ -48,6 +48,7 @@ #include "utils/builtins.h" #include "utils/guc.h" #include "utils/ps_status.h" +#include "pg_trace.h" /* File path names (all relative to $PGDATA) */ @@ -486,6 +487,8 @@ XLogInsert(RmgrId rmid, uint8 info, XLog if (info & XLR_INFO_MASK) elog(PANIC, "invalid xlog info mask %02X", info); + TRACE_POSTGRESQL_XLOG_INSERT(rmid, info); + /* * In bootstrap mode, we don't actually log anything but XLOG resources; * return a phony record pointer. @@ -914,6 +917,8 @@ begin:; XLogwrtRqst FlushRqst; XLogRecPtr OldSegEnd; + TRACE_POSTGRESQL_XLOG_SWITCH(); + LWLockAcquire(WALWriteLock, LW_EXCLUSIVE); /* @@ -1313,12 +1318,14 @@ AdvanceXLInsertBuffer(bool new_segment) * Have to write buffers while holding insert lock. This is * not good, so only write as much as we absolutely must. */ + TRACE_POSTGRESQL_WAL_BUFFER_WRITE_START(); WriteRqst.Write = OldPageRqstPtr; WriteRqst.Flush.xlogid = 0; WriteRqst.Flush.xrecoff = 0; XLogWrite(WriteRqst, false, false); LWLockRelease(WALWriteLock); Insert->LogwrtResult = LogwrtResult; + TRACE_POSTGRESQL_WAL_BUFFER_WRITE_DONE(); } } } @@ -5904,6 +5911,8 @@ CreateCheckPoint(int flags) if (log_checkpoints) LogCheckpointStart(flags); + TRACE_POSTGRESQL_CHECKPOINT_START(flags); + /* * Before flushing data, we must wait for any transactions that are * currently in their commit critical sections. If an xact inserted its @@ -6069,6 +6078,11 @@ CreateCheckPoint(int flags) if (log_checkpoints) LogCheckpointEnd(); +TRACE_POSTGRESQL_CHECKPOINT_DONE(CheckpointStats.ckpt_bufs_written, +NBuffers, CheckpointStats.ckpt_segs_added, +CheckpointStats.ckpt_segs_removed, +CheckpointStats.ckpt_segs_recycled); + LWLockRelease(CheckpointLock); } Index: src/backend/storage/buffer/bufmgr.c === RCS file: /projects/cvsroot/pgsql/src/backend/storage/buffer/bufmgr.c,v retrieving revision 1.242 diff -u -3 -p -r1.242 bufmgr.c --- src/backend/storage/buffer/bufmgr.c 19 Nov 2008 10:34:52 - 1.242 +++ src/backend/storage/buffer/bufmgr.c 16 Dec 2008 19:46:11 - @@ -203,8 +203,7 @@ ReadBuffer_common(SMgrRelation smgr, boo if (isExtend) blockNum = smgrnblocks(smgr, forkNum); - TRACE_POSTGRESQL_BUFFER_READ_START(blockNum, smgr->smgr_rnode.spcNode, - smgr->smgr_rnode.dbNode, smgr->smgr_rnode.relNode, isLocalBuf); + TRACE_POSTGRESQL_BUFFER_READ_START(forkNum, blockNum, smgr->smgr_rnode.spcNode, smgr->smgr_rnode.dbNode, smgr->smgr_rnode.relNode, isLocalBuf); if (isLocalBuf) { @@ -253,7 +252,7 @@ ReadBuffer_common(SMgrRelation smgr, boo if (VacuumCostActive) VacuumCostBalance += VacuumCostPageHit; - TRACE_POSTGRESQL_BUFFER_READ_DONE(blockNum, + TRACE_POSTGRESQL_BUFFER_READ_DONE(forkNum, blockNum, smgr->smgr_rnode.spcNode, smgr->smgr_rnode.dbNode, smgr->smgr_rnode.relNode, isLocalBuf, found); @@ -380,9 +379,9 @@ ReadBuffer_common(SMgrRelation smgr, boo if (VacuumCostActive) VacuumCostBalance += VacuumCostPageMiss; - TRACE_POSTGRESQL_BUFFER_READ_DONE(blockNum, smgr->smgr_rnode.spcNode, - smgr->smgr_rnode.dbNode, smgr->smgr_rnode.relNode, - isLocalBuf, found); + TRACE_POSTGRESQL_BUFFER_READ_DONE(forkNum, blockNum, + smgr->smgr_rnode.spcNode, smgr->smgr_rnode.dbNode, + smgr->smgr_rnode.relNode, isLocalBuf, found); return BufferDescriptorGetB
Re: [HACKERS] DTrace probes patch
Bruce Momjian wrote: Should I apply this or hold it for 8.5? I think it should go into 8.4 as it also fixes existing problems. -Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] DTrace probes patch
Peter Eisentraut wrote: Robert Lor wrote: The attached patch contains a couple of fixes in the existing probes and includes a few new ones. - Fixed compilation errors on OS X for probes that use typedefs Could you explain what these errors are about? I don't see any errors on my machine. In the current probes.d, the following probe definitions are commented out because they cause compilation errors on OS X. * probe lock__wait__start(unsigned int, LOCKMODE); * probe lock__wait__done(unsigned int, LOCKMODE); * probe buffer__read__start(BlockNumber, Oid, Oid, Oid, bool); * probe buffer__read__done(BlockNumber, Oid, Oid, Oid, bool, bool); The problem was fixed by making the changes below. probes.d is preprocessed with cpp and as such only macros get expanded. From: typedef unsigned int LocalTransactionId; typedef int LWLockId; typedef int LWLockMode; typedef int LOCKMODE; typedef unsigned int BlockNumber; typedef unsigned int Oid; typedef int ForkNumber; To: #define LocalTransactionId unsigned int #define LWLockId int #define LWLockMode int #define LOCKMODE int #define BlockNumber unsigned int #define Oid unsigned int #define ForkNumber int -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] DTrace probes patch
The attached patch contains a couple of fixes in the existing probes and includes a few new ones. - Fixed compilation errors on OS X for probes that use typedefs - Fixed a number of probes to pass ForkNumber per the relation forks patch - The new probes are those that were taken out from the previous submitted patch and required simple fixes. Will submit the other probes that may require more discussion in a separate patch. -Robert Index: src/backend/access/transam/xlog.c === RCS file: /projects/cvsroot/pgsql/src/backend/access/transam/xlog.c,v retrieving revision 1.322 diff -u -3 -p -r1.322 xlog.c --- src/backend/access/transam/xlog.c 9 Nov 2008 17:51:15 - 1.322 +++ src/backend/access/transam/xlog.c 15 Dec 2008 05:12:41 - @@ -48,6 +48,7 @@ #include "utils/builtins.h" #include "utils/guc.h" #include "utils/ps_status.h" +#include "pg_trace.h" /* File path names (all relative to $PGDATA) */ @@ -486,6 +487,8 @@ XLogInsert(RmgrId rmid, uint8 info, XLog if (info & XLR_INFO_MASK) elog(PANIC, "invalid xlog info mask %02X", info); + TRACE_POSTGRESQL_XLOG_INSERT(rmid, info); + /* * In bootstrap mode, we don't actually log anything but XLOG resources; * return a phony record pointer. @@ -914,6 +917,8 @@ begin:; XLogwrtRqst FlushRqst; XLogRecPtr OldSegEnd; + TRACE_POSTGRESQL_XLOG_SWITCH(); + LWLockAcquire(WALWriteLock, LW_EXCLUSIVE); /* @@ -1313,12 +1318,14 @@ AdvanceXLInsertBuffer(bool new_segment) * Have to write buffers while holding insert lock. This is * not good, so only write as much as we absolutely must. */ + TRACE_POSTGRESQL_WAL_BUFFER_WRITE_START(); WriteRqst.Write = OldPageRqstPtr; WriteRqst.Flush.xlogid = 0; WriteRqst.Flush.xrecoff = 0; XLogWrite(WriteRqst, false, false); LWLockRelease(WALWriteLock); Insert->LogwrtResult = LogwrtResult; + TRACE_POSTGRESQL_WAL_BUFFER_WRITE_DONE(); } } } @@ -5904,6 +5911,8 @@ CreateCheckPoint(int flags) if (log_checkpoints) LogCheckpointStart(flags); + TRACE_POSTGRESQL_CHECKPOINT_START(flags); + /* * Before flushing data, we must wait for any transactions that are * currently in their commit critical sections. If an xact inserted its @@ -6069,6 +6078,11 @@ CreateCheckPoint(int flags) if (log_checkpoints) LogCheckpointEnd(); +TRACE_POSTGRESQL_CHECKPOINT_DONE(CheckpointStats.ckpt_bufs_written, +NBuffers, CheckpointStats.ckpt_segs_added, +CheckpointStats.ckpt_segs_removed, +CheckpointStats.ckpt_segs_recycled); + LWLockRelease(CheckpointLock); } Index: src/backend/storage/buffer/bufmgr.c === RCS file: /projects/cvsroot/pgsql/src/backend/storage/buffer/bufmgr.c,v retrieving revision 1.242 diff -u -3 -p -r1.242 bufmgr.c --- src/backend/storage/buffer/bufmgr.c 19 Nov 2008 10:34:52 - 1.242 +++ src/backend/storage/buffer/bufmgr.c 15 Dec 2008 05:12:45 - @@ -203,8 +203,7 @@ ReadBuffer_common(SMgrRelation smgr, boo if (isExtend) blockNum = smgrnblocks(smgr, forkNum); - TRACE_POSTGRESQL_BUFFER_READ_START(blockNum, smgr->smgr_rnode.spcNode, - smgr->smgr_rnode.dbNode, smgr->smgr_rnode.relNode, isLocalBuf); + TRACE_POSTGRESQL_BUFFER_READ_START(forkNum, blockNum, smgr->smgr_rnode.spcNode, smgr->smgr_rnode.dbNode, smgr->smgr_rnode.relNode, isLocalBuf); if (isLocalBuf) { @@ -253,7 +252,7 @@ ReadBuffer_common(SMgrRelation smgr, boo if (VacuumCostActive) VacuumCostBalance += VacuumCostPageHit; - TRACE_POSTGRESQL_BUFFER_READ_DONE(blockNum, + TRACE_POSTGRESQL_BUFFER_READ_DONE(forkNum, blockNum, smgr->smgr_rnode.spcNode, smgr->smgr_rnode.dbNode, smgr->smgr_rnode.relNode, isLocalBuf, found); @@ -380,9 +379,9 @@ ReadBuffer_common(SMgrRelation smgr, boo if (VacuumCostActive) VacuumCostBalance += VacuumCostPageMiss; - TRACE_POSTGRESQL_BUFFER_READ_DONE(blockNum, smgr->smgr_rnode.spcNode, - smgr->smgr_rnode.dbNode, smgr->smgr_rnode.relNode, - isLocalBuf,
Re: [HACKERS] Review: DTrace probes (merged version) ver_03
Alvaro Herrera wrote: Greg Smith wrote: One tiny change I'd suggest here: if you look at the code for checkpoint buffer writing there are traces for two points in the process: CheckPointBuffers(int flags) { + TRACE_POSTGRESQL_BUFFER_CHECKPOINT_START(flags); CheckpointStats.ckpt_write_t = GetCurrentTimestamp(); BufferSync(flags); CheckpointStats.ckpt_sync_t = GetCurrentTimestamp(); smgrsync(); CheckpointStats.ckpt_sync_end_t = GetCurrentTimestamp(); + TRACE_POSTGRESQL_BUFFER_CHECKPOINT_DONE(); } Note how the existing code also tracks how long the sync phase took compared to the write one, and reports both numbers in the checkpoint logs. It would be nice to add another probe at that same point (just after ckpt_sync_t is set) so that dtrace users could instrument all these possibilities as well: just buffer write time/resources, just sync ones, or both. Sounds like the thing to do would be to pass CheckpointStats into the DONE probe. I like this approach as it avoids the need to have too many probes. I will make this change and get it in with the remaining probes for the next commit fest. -- Robert Lor Sun Microsystems Austin, USA http://sun.com/postgresql -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Review: DTrace probes (merged version) ver_03
Alvaro Herrera wrote: Alvaro Herrera wrote: Applied. I forgot to mention that I renamed the sort_end probe to sort_done, to keep the naming convention. It is a backwards-incompatible change, but there were plenty other probes renamed so my guess is that one more does not matter all that much. It was overlooked :-( . Good catch! Also, it seems I cannot sort pg_trace and pgstat consistently :-( Not sure what you're trying to say here! -- Robert Lor Sun Microsystems Austin, USA http://sun.com/postgresql -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Review: DTrace probes (merged version) ver_03
Alvaro Herrera wrote: Applied. Thanks! How come "Oid" works for FLUSH_START but not READ_START and READ_DONE? I'll get the answer for this. Also, I wonder if there's any proof that this works at all on Mac OS X, given that the rule to create probes.o from probes.d is conditionally pulled in only for Solaris? It does work on OS X. I dare everyone to try it :-) The implementation of DTrace on OS X is a bit different than on Solaris, so the rule your referring to is only needed for Solaris. -- Robert Lor Sun Microsystems Austin, USA http://sun.com/postgresql -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Review: DTrace probes (merged version) ver_03
Alvaro Herrera wrote: I was checking the DTrace docs for other reasons and I came across this, which maybe can be useful here: http://docs.sun.com/app/docs/doc/817-6223/chp-xlate?a=view Yes, I think using the translator is the best approach to expose internal structures in a stable manner. -- Robert Lor Sun Microsystems Austin, USA http://sun.com/postgresql -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Review: DTrace probes (merged version) ver_03
Alvaro Herrera wrote: Here's what I have. Please confirm that this compiles for you. I made some changes to the sed script so it works with the sed on Solaris & OS X. I tested this patch on both Solaris and OS X with DTrace enabled and disabled and also verified that the sed script works with GNU sed. I hope this is the final change for this patch. Thanks for catching all the issues, and my bad for not testing with DTrace disabled. -- Robert Lor Sun Microsystems Austin, USA http://sun.com/postgresql Index: src/backend/Makefile === RCS file: /projects/cvsroot/pgsql/src/backend/Makefile,v retrieving revision 1.128 diff -u -3 -p -r1.128 Makefile --- src/backend/Makefile17 Mar 2008 19:44:40 - 1.128 +++ src/backend/Makefile1 Aug 2008 03:56:13 - @@ -147,7 +147,7 @@ $(top_builddir)/src/include/utils/probes ifeq ($(PORTNAME), solaris) utils/probes.o: utils/probes.d $(SUBDIROBJS) - $(DTRACE) $(DTRACEFLAGS) -G -s $(call expand_subsys,$^) -o $@ + $(DTRACE) $(DTRACEFLAGS) -C -G -s $(call expand_subsys,$^) -o $@ endif Index: src/backend/access/transam/clog.c === RCS file: /projects/cvsroot/pgsql/src/backend/access/transam/clog.c,v retrieving revision 1.46 diff -u -3 -p -r1.46 clog.c --- src/backend/access/transam/clog.c 1 Jan 2008 19:45:46 - 1.46 +++ src/backend/access/transam/clog.c 1 Aug 2008 03:56:14 - @@ -36,6 +36,7 @@ #include "access/slru.h" #include "access/transam.h" #include "postmaster/bgwriter.h" +#include "pg_trace.h" /* * Defines for CLOG page sizes. A page is the same BLCKSZ as is used @@ -313,7 +314,9 @@ void ShutdownCLOG(void) { /* Flush dirty CLOG pages to disk */ + TRACE_POSTGRESQL_CLOG_CHECKPOINT_START(false); SimpleLruFlush(ClogCtl, false); + TRACE_POSTGRESQL_CLOG_CHECKPOINT_DONE(false); } /* @@ -323,7 +326,9 @@ void CheckPointCLOG(void) { /* Flush dirty CLOG pages to disk */ + TRACE_POSTGRESQL_CLOG_CHECKPOINT_START(true); SimpleLruFlush(ClogCtl, true); + TRACE_POSTGRESQL_CLOG_CHECKPOINT_DONE(true); } Index: src/backend/access/transam/multixact.c === RCS file: /projects/cvsroot/pgsql/src/backend/access/transam/multixact.c,v retrieving revision 1.27 diff -u -3 -p -r1.27 multixact.c --- src/backend/access/transam/multixact.c 1 Jan 2008 19:45:46 - 1.27 +++ src/backend/access/transam/multixact.c 1 Aug 2008 03:56:15 - @@ -57,6 +57,7 @@ #include "storage/lmgr.h" #include "utils/memutils.h" #include "storage/procarray.h" +#include "pg_trace.h" /* @@ -1497,8 +1498,10 @@ void ShutdownMultiXact(void) { /* Flush dirty MultiXact pages to disk */ + TRACE_POSTGRESQL_MULTIXACT_CHECKPOINT_START(false); SimpleLruFlush(MultiXactOffsetCtl, false); SimpleLruFlush(MultiXactMemberCtl, false); + TRACE_POSTGRESQL_MULTIXACT_CHECKPOINT_DONE(false); } /* @@ -1526,6 +1529,8 @@ MultiXactGetCheckptMulti(bool is_shutdow void CheckPointMultiXact(void) { + TRACE_POSTGRESQL_MULTIXACT_CHECKPOINT_START(true); + /* Flush dirty MultiXact pages to disk */ SimpleLruFlush(MultiXactOffsetCtl, true); SimpleLruFlush(MultiXactMemberCtl, true); @@ -1540,6 +1545,8 @@ CheckPointMultiXact(void) */ if (!InRecovery) TruncateMultiXact(); + + TRACE_POSTGRESQL_MULTIXACT_CHECKPOINT_DONE(true); } /* Index: src/backend/access/transam/subtrans.c === RCS file: /projects/cvsroot/pgsql/src/backend/access/transam/subtrans.c,v retrieving revision 1.22 diff -u -3 -p -r1.22 subtrans.c --- src/backend/access/transam/subtrans.c 26 Mar 2008 18:48:59 - 1.22 +++ src/backend/access/transam/subtrans.c 1 Aug 2008 03:56:15 - @@ -32,6 +32,7 @@ #include "access/subtrans.h" #include "access/transam.h" #include "utils/snapmgr.h" +#include "pg_trace.h" /* @@ -265,7 +266,9 @@ ShutdownSUBTRANS(void) * This is not actually necessary from a correctness point of view. We do * it merely as a debugging aid. */ + TRACE_POSTGRESQL_SUBTRANS_CHECKPOINT_START(false); SimpleLruFlush(SubTransCtl, false); + TRACE_POSTGRESQL_SUBTRANS_CHECKPOINT_DONE(false); } /* @@ -281,7 +284,9 @@ CheckPointSUBTRANS(void) * it merely to improve the odds that writing of dirty pages is done by * the checkpoint process and not by backends. */ + TRACE_POSTGRESQL_SUBTRANS_CHECKPOINT_START(true); Sim
Re: [HACKERS] Review: DTrace probes (merged version) ver_03
Alvaro Herrera wrote: But I don't see a reason to define the rest: +typedef unsigned int locktag_field2; +typedef const char * query_string; +typedef int sortType; +typedef int trueFalse; +typedef int nkeys; +typedef int workMem; +typedef int randomAccess; +typedef unsigned long LogicalTapeSetPtr; +typedef long spaceUsed; +typedef int isLocalBuf; +typedef int found; +typedef int flags; +typedef int num_to_write; +typedef int num_written; +typedef int NBuffers; +typedef int buf_id; I think you should add a #define Size, perhaps #define bool, and use those where applicable, and the plain types (int, long, etc) in the rest. Fixed. Patch attached. That Mac OS X problem merits some extra investigation, I think. I'm investigating this one and will find the root cause, but I don't think it should hold back this patch. Other than this, I think this patch can be committed. I'd appreciate if it can be committed today. Alvaro, thanks a bunch for the feedback! -- Robert Lor Sun Microsystems Austin, USA http://sun.com/postgresql Index: src/backend/access/transam/clog.c === RCS file: /projects/cvsroot/pgsql/src/backend/access/transam/clog.c,v retrieving revision 1.46 diff -u -3 -p -r1.46 clog.c --- src/backend/access/transam/clog.c 1 Jan 2008 19:45:46 - 1.46 +++ src/backend/access/transam/clog.c 31 Jul 2008 20:09:15 - @@ -36,6 +36,7 @@ #include "access/slru.h" #include "access/transam.h" #include "postmaster/bgwriter.h" +#include "pg_trace.h" /* * Defines for CLOG page sizes. A page is the same BLCKSZ as is used @@ -313,7 +314,9 @@ void ShutdownCLOG(void) { /* Flush dirty CLOG pages to disk */ + TRACE_POSTGRESQL_CLOG_CHECKPOINT_START(false); SimpleLruFlush(ClogCtl, false); + TRACE_POSTGRESQL_CLOG_CHECKPOINT_DONE(false); } /* @@ -323,7 +326,9 @@ void CheckPointCLOG(void) { /* Flush dirty CLOG pages to disk */ + TRACE_POSTGRESQL_CLOG_CHECKPOINT_START(true); SimpleLruFlush(ClogCtl, true); + TRACE_POSTGRESQL_CLOG_CHECKPOINT_DONE(true); } Index: src/backend/access/transam/multixact.c === RCS file: /projects/cvsroot/pgsql/src/backend/access/transam/multixact.c,v retrieving revision 1.27 diff -u -3 -p -r1.27 multixact.c --- src/backend/access/transam/multixact.c 1 Jan 2008 19:45:46 - 1.27 +++ src/backend/access/transam/multixact.c 31 Jul 2008 20:09:16 - @@ -57,6 +57,7 @@ #include "storage/lmgr.h" #include "utils/memutils.h" #include "storage/procarray.h" +#include "pg_trace.h" /* @@ -1497,8 +1498,10 @@ void ShutdownMultiXact(void) { /* Flush dirty MultiXact pages to disk */ + TRACE_POSTGRESQL_MULTIXACT_CHECKPOINT_START(false); SimpleLruFlush(MultiXactOffsetCtl, false); SimpleLruFlush(MultiXactMemberCtl, false); + TRACE_POSTGRESQL_MULTIXACT_CHECKPOINT_DONE(false); } /* @@ -1526,6 +1529,8 @@ MultiXactGetCheckptMulti(bool is_shutdow void CheckPointMultiXact(void) { + TRACE_POSTGRESQL_MULTIXACT_CHECKPOINT_START(true); + /* Flush dirty MultiXact pages to disk */ SimpleLruFlush(MultiXactOffsetCtl, true); SimpleLruFlush(MultiXactMemberCtl, true); @@ -1540,6 +1545,8 @@ CheckPointMultiXact(void) */ if (!InRecovery) TruncateMultiXact(); + + TRACE_POSTGRESQL_MULTIXACT_CHECKPOINT_DONE(true); } /* Index: src/backend/access/transam/subtrans.c === RCS file: /projects/cvsroot/pgsql/src/backend/access/transam/subtrans.c,v retrieving revision 1.22 diff -u -3 -p -r1.22 subtrans.c --- src/backend/access/transam/subtrans.c 26 Mar 2008 18:48:59 - 1.22 +++ src/backend/access/transam/subtrans.c 31 Jul 2008 20:09:17 - @@ -32,6 +32,7 @@ #include "access/subtrans.h" #include "access/transam.h" #include "utils/snapmgr.h" +#include "pg_trace.h" /* @@ -265,7 +266,9 @@ ShutdownSUBTRANS(void) * This is not actually necessary from a correctness point of view. We do * it merely as a debugging aid. */ + TRACE_POSTGRESQL_SUBTRANS_CHECKPOINT_START(false); SimpleLruFlush(SubTransCtl, false); + TRACE_POSTGRESQL_SUBTRANS_CHECKPOINT_DONE(false); } /* @@ -281,7 +284,9 @@ CheckPointSUBTRANS(void) * it merely to improve the odds that writing of dirty pages is done by * the checkpoint process and not by backends. */ + TRACE_POSTGRESQL_SUBTRANS_CHECKPOINT_START(true); SimpleLruFlush(SubTransCtl, true); + TRACE_POSTGRESQL_SUBTRANS_CHECKPOINT_DONE(true); } Index: src/backend/access/transam/twophase.c
Re: [HACKERS] Review: DTrace probes (merged version) ver_03
Updated patch attached. This patch addresses all of Alvaro's feedback except the new probes for v3 protocol which will be submitted later along with the rest of the probes. It also incorporates Tom's feedback as explained inline below. I hope this patch is good to go for this commit fest. Will take care of the rest in the next fest. Tom Lane wrote: Zdenek Kotala <[EMAIL PROTECTED]> writes: I performed review and I prepared own patch which contains only probes without any issue. I suggest commit this patch because the rest of patch is independent and it can be committed next commit fest after rework. I looked at this patch a little bit. In addition to the comments Alvaro made, I have a couple more issues: * The probes that pass buffer tag elements are already broken by the pending "relation forks" patch: there is soon going to be another field in buffer tags. Perhaps it'd be feasible to pass the buffer tag as a single probe argument to make that a bit more future-proof? I'm not sure if that would complicate the use of the probe so much as to be counterproductive. Took out the buffer tag argument for now. Will figure out how to best solve this after this relation forks patch is committed. What I suggest might be a reasonable compromise is to copy needed typedefs directly into the probes.d file: typedef unsigned int LocalTransactionId; provider postgresql { probe transaction__start(LocalTransactionId); This at least makes it possible to declare the probes cleanly, and it's fairly obvious what to fix if the principal definition of LocalTransactionId ever changes. I don't have Solaris to test on, but on OS X this seems to behave the way we'd like: the typedef itself isn't copied into the emitted probes.h file, but the emitted extern declarations use it. Implemented this suggestion. There are some weirdness with the OS X compiler causing some of the probe declarations not to compile (see comments in probe.d). The compiler spits out some warnings because the types don't show up in the function prototype in probes.h, but the probes work okay. I think we can safely ignore the warnings. -- Robert Lor Sun Microsystems Austin, USA http://sun.com/postgresql Index: src/backend/access/transam/clog.c === RCS file: /projects/cvsroot/pgsql/src/backend/access/transam/clog.c,v retrieving revision 1.46 diff -u -3 -p -r1.46 clog.c --- src/backend/access/transam/clog.c 1 Jan 2008 19:45:46 - 1.46 +++ src/backend/access/transam/clog.c 30 Jul 2008 04:02:32 - @@ -36,6 +36,7 @@ #include "access/slru.h" #include "access/transam.h" #include "postmaster/bgwriter.h" +#include "pg_trace.h" /* * Defines for CLOG page sizes. A page is the same BLCKSZ as is used @@ -313,7 +314,9 @@ void ShutdownCLOG(void) { /* Flush dirty CLOG pages to disk */ + TRACE_POSTGRESQL_CLOG_CHECKPOINT_START(false); SimpleLruFlush(ClogCtl, false); + TRACE_POSTGRESQL_CLOG_CHECKPOINT_DONE(false); } /* @@ -323,7 +326,9 @@ void CheckPointCLOG(void) { /* Flush dirty CLOG pages to disk */ + TRACE_POSTGRESQL_CLOG_CHECKPOINT_START(true); SimpleLruFlush(ClogCtl, true); + TRACE_POSTGRESQL_CLOG_CHECKPOINT_DONE(true); } Index: src/backend/access/transam/multixact.c === RCS file: /projects/cvsroot/pgsql/src/backend/access/transam/multixact.c,v retrieving revision 1.27 diff -u -3 -p -r1.27 multixact.c --- src/backend/access/transam/multixact.c 1 Jan 2008 19:45:46 - 1.27 +++ src/backend/access/transam/multixact.c 30 Jul 2008 04:02:33 - @@ -57,6 +57,7 @@ #include "storage/lmgr.h" #include "utils/memutils.h" #include "storage/procarray.h" +#include "pg_trace.h" /* @@ -1497,8 +1498,10 @@ void ShutdownMultiXact(void) { /* Flush dirty MultiXact pages to disk */ + TRACE_POSTGRESQL_MULTIXACT_CHECKPOINT_START(false); SimpleLruFlush(MultiXactOffsetCtl, false); SimpleLruFlush(MultiXactMemberCtl, false); + TRACE_POSTGRESQL_MULTIXACT_CHECKPOINT_DONE(false); } /* @@ -1526,6 +1529,8 @@ MultiXactGetCheckptMulti(bool is_shutdow void CheckPointMultiXact(void) { + TRACE_POSTGRESQL_MULTIXACT_CHECKPOINT_START(true); + /* Flush dirty MultiXact pages to disk */ SimpleLruFlush(MultiXactOffsetCtl, true); SimpleLruFlush(MultiXactMemberCtl, true); @@ -1540,6 +1545,8 @@ CheckPointMultiXact(void) */ if (!InRecovery) TruncateMultiXact(); + + TRACE_POSTGRESQL_MULTIXACT_CHECKPOINT_DONE(true); } /* Index: src/backend/access/transam/subtrans.c =
Re: [HACKERS] Review: DTrace probes (merged version) ver_03
Tom Lane wrote: By "break" I meant "fail to function usefully". Yes, it would still compile, but if you don't have the fork number available then you won't be able to tell what's really happening in the buffer pool. You might as well not pass any of the buffer tag as pass only part of it. Got it. The issue is with Apple's dtrace implementation, not Xcode. For more info, please see the link below. http://www.opensolaris.org/jive/thread.jspa?messageID=252503𽩗 I think what this is complaining about is whether allegedly built-in typedefs like uintptr_t work. This is the message I tried to convey with the comment in probe.d, but I guess it was not clear. What we care about is different: can we write an explicit typedef in the .d file? Yes. I do not know if that worked in XCode 3.0 or not, but it seems to work fine in the version of dtrace shipped in 3.1. (And I'm perfectly fine with telling people that they can't compile Postgres dtrace support with less than the most recent tool set, especially since it'll be fairly old by the time 8.4 ships.) I tested on both Xcode 3.0 & 3.1 and both worked. -- Robert Lor Sun Microsystems Austin, USA http://sun.com/postgresql -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Review: DTrace probes (merged version) ver_03
Tom Lane wrote: * The probes that pass buffer tag elements are already broken by the pending "relation forks" patch: there is soon going to be another field in buffer tags. Perhaps it'd be feasible to pass the buffer tag as a single probe argument to make that a bit more future-proof? I'm not sure if that would complicate the use of the probe so much as to be counterproductive. Are you referring to this struct? typedef struct buftag { RelFileNode rnode; /* physical relation identifier */ BlockNumber blockNum; /* blknum relative to begin of reln */ } BufferTag; I'm not familiar with this pending patch, but why would it break when another field is added? It's certainly possible to pass the buffer tag, but I'd say it's not a good idea if the other fields will never be used. * I find this to be truly bletcherous: ! /* ! * Due to a bug in Mac OS X 10.5, using defined types (e.g. uintptr_t, ! * uint32_t, etc.) cause compilation error. ! */ ! ! probe transaction__start(unsigned int transactionId); ! probe transaction__commit(unsigned int transactionId); ! probe transaction__abort(unsigned int transactionId); especially since some of the manual translations in the file are flat out wrong (Oid is unsigned for instance). Oops! Furthermore the comment is wrong, at least according to my tests with XCode 3.1. Typedefs seem to work fine. The issue is with Apple's dtrace implementation, not Xcode. For more info, please see the link below. http://www.opensolaris.org/jive/thread.jspa?messageID=252503𽩗 What I suggest might be a reasonable compromise is to copy needed typedefs directly into the probes.d file: typedef unsigned int LocalTransactionId; provider postgresql { probe transaction__start(LocalTransactionId); I like this solution. -- Robert Lor Sun Microsystems Austin, USA http://sun.com/postgresql -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Review: DTrace probes (merged version) ver_03
Zdenek Kotala wrote: Alvaro Herrera napsal(a): Zdenek Kotala wrote: I performed review and I prepared own patch which contains only probes without any issue. I suggest commit this patch because the rest of patch is independent and it can be committed next commit fest after rework. I found following issues: I noticed that CLOG, Subtrans and Multixact probes are added during a regular Checkpoint, but not during a shutdown flush. I think the probes should count that too (probably with the same counter). Yeah, good catch. Ok. I think the names should be the same but pass a true/false argument to differentiate the call just like how it's used in SimpleLruFlush. e.g. TRACE_POSTGRESQL_CLOG_CHECKPOINT_START(false) # shutdown case TRACE_POSTGRESQL_CLOG_CHECKPOINT_START(true) In the pgstat_report_activity probe, is it good to call the probe before taking the fast path out? If you mean to put it behind "if (!pgstat_track_activities || !beentry)" then I think that current placement is OK. You should be possibility to track it without dependency on pgstat_track_activities GUC variable. Keep in mind that DTrace is designed to monitor/trace production system and ability to monitor something without any configuration change is main idea. This probe just prints out the statement, and it doesn't matter whether or not track_activities is enabled. In the BUFFER_READ_START probe, we do not include the smgrnblocks() call, which could be significant since it includes a number of system calls. It is because the BUFFER_READ_START needs to report correct blockno. My suggestion is to add probes to smgrblock function. Yes, that's the main reason. I think BUFFER_HIT and BUFFER_MISS should include the "isLocalBuf" flag. I also wonder whether BUFFER_HIT should be called in the block above, lines 220-238, where we check the "found" flag, i.e. if (isLocalBuf) { ReadLocalBufferCount++; bufHdr = LocalBufferAlloc(smgr, blockNum, &found); if (found) { LocalBufferHitCount++; TRACE_POSTGRESQL_BUFFER_HIT(true);/* local buffer */ } else { TRACE_POSTGRESQL_BUFFER_MISS(true);/* ditto */ } } else { ReadBufferCount++; /* * lookup the buffer. IO_IN_PROGRESS is set if the requested block is * not currently in memory. */ bufHdr = BufferAlloc(smgr, blockNum, strategy, &found); if (found) { BufferHitCount++; TRACE_POSTGRESQL_BUFFER_HIT(false);/* not local */ } else { TRACE_POSTGRESQL_BUFFER_MISS(false);/* ditto */ } } (note that this changes the semantics w.r.t. the isExtend flag). Good point. The question about isExted is if we want to have exact output include corner case or be to sync with ReadBufferCount counter. I prefer your suggested placement. Agree. I understand the desire to have DEADLOCK_FOUND, but is there really a point in having a DEADLOCK_NOTFOUND probe? Since this code runs every time someone waits for a lock longer than a second, there would be a lot of useless counts and nothing useful. No idea. Robert any comment? Yes, you're right. Will delete. I find it bogus that we include query rewriting in QUERY_PARSE_START/DONE. I think query rewriting should be a separate probe. agree Will fix. I was also thinking about having the probes by modules but wanted to limit the number of probes, but I'm fine having another one. QUERY_PLAN_START is badly placed -- it should be after the check for utility commands (alternatively there could be a QUERY_PLAN_DONE in the fast way out for utility commands, but in that case a "is utility" flag would be needed. I don't see that there's any point in tracing planning of utility commands though). agree Ok Why are there no probes for the v3 protocol stuff? There should be probes for Parse, Bind, Execute message processing too, for completeness. Thanks for catching this. Also, I wonder if these probes should be in the for(;;) loop in PostgresMain() instead of sprinkled in the other routines. I note that the probes in PortalRun and PortalRunMulti are schizophrenic about whether they include utility functions or not. As are as I can tell, the current probes in PortalRun/Multi don't include utility functions. Should they be included? I'll send the patch for the above changes tomorrow! -- Robert Lor Sun Microsystems Austin, USA http://sun.com/postgresql -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Review: DTrace probes (merged version) ver_03
Zdenek Kotala wrote: However what I suggested is commit probes without issue now and the rest will be processed on the next commit fest after rework/discussion. Agreed. I'll fix up the remaining issues with the patch you sent. -- Robert Lor Sun Microsystems Austin, USA http://sun.com/postgresql -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Review: DTrace probes (merged version)
Apologies for the delayed response - vacation, travel, etc got in the way! Zdenek Kotala wrote: I performed review of merged patch from Robert Treat. At first point the patch does not work (SunOS 5.11 snv_86 sun4u sparc SUNW,Sun-Fire-V240) The attached patch fixed the regression test errors. However I went through a code and I have following comments: 1) Naming convention: - Some probes uses "*end", some "*done". It would be good to select one name. Noted. Will use -done per the convention. This change will be included in an updated patch later since I think there are a number of other changes that need to be made. - I prefer to use clog instead of slru in probes name. clog is widely known. I think slru- is okay per your subsequent emails. - It seems to me better to have checkpoint-clog..., checkpoint-subtrans instead of clog-checkpoint. Yes, I was thinking about this too, but the current names are more consistent with the others. For example: buffer-checkpoint, buffer-* xlog-checkpoint, xlog-* - buffer-flush was originally dirty-buffer-write-start. I prefer Robert Lor's naming. Actually, I made this change so the name is consistent with the other buffer-* probes. 2) storage read write probes smgr-read*, smgr-writes probes are in md.c. I personally think it make more sense to put them into smgr.c. Only advantage to have it in md.c is that actual amount of bytes is possible to monitor. The current probes return number of bytes, that's why they are in md.c 3) query rewrite probe There are several probes for query measurement but query rewrite phase is missing. See analyze_and_rewrite or pg_parse_and_rewrite I believe the rewrite time is accounted for in the query-plan probe. Need to double check on this. 4) autovacuum_start Autovacuum_start probe is alone. I propose following probes for completeness: proc-autovacuum-start proc-autovacuum-stop proc-bgwriter-start proc-bgwriter-stop proc-backend-start proc-backend-stop proc-master-start proc-master-stop Saw a number of emails on this. Will get back on this. 5) Need explain of usage: I have some doubts about following probes. Could you please explain usage of them? example dtrace script is welcome - all exec-* probes - mark-dirty, local-mark-dirty Theo/Robert, do you guys have any sample scripts on the probes you added. 6) several comments about placement: I published patch on http://reviewdemo.postgresql.org/r/25/. I added several comments there. 7) SLRU/CLOG SLRU probes could be return more info. For example if page was in buffer or if physical write is not necessary and so on. Yes, more info could be returned if we can identify specific use cases that the new data will enable. -- Robert Lor Sun Microsystems Austin, USA http://sun.com/postgresql Index: src/backend/access/transam/clog.c === RCS file: /projects/cvsroot/pgsql/src/backend/access/transam/clog.c,v retrieving revision 1.46 diff -u -3 -p -r1.46 clog.c --- src/backend/access/transam/clog.c 1 Jan 2008 19:45:46 - 1.46 +++ src/backend/access/transam/clog.c 21 Jul 2008 18:14:48 - @@ -36,6 +36,7 @@ #include "access/slru.h" #include "access/transam.h" #include "postmaster/bgwriter.h" +#include "pg_trace.h" /* * Defines for CLOG page sizes. A page is the same BLCKSZ as is used @@ -323,7 +324,9 @@ void CheckPointCLOG(void) { /* Flush dirty CLOG pages to disk */ + TRACE_POSTGRESQL_CLOG_CHECKPOINT_START(); SimpleLruFlush(ClogCtl, true); + TRACE_POSTGRESQL_CLOG_CHECKPOINT_DONE(); } Index: src/backend/access/transam/multixact.c === RCS file: /projects/cvsroot/pgsql/src/backend/access/transam/multixact.c,v retrieving revision 1.27 diff -u -3 -p -r1.27 multixact.c --- src/backend/access/transam/multixact.c 1 Jan 2008 19:45:46 - 1.27 +++ src/backend/access/transam/multixact.c 21 Jul 2008 18:21:58 - @@ -57,6 +57,7 @@ #include "storage/lmgr.h" #include "utils/memutils.h" #include "storage/procarray.h" +#include "pg_trace.h" /* @@ -1526,6 +1527,8 @@ MultiXactGetCheckptMulti(bool is_shutdow void CheckPointMultiXact(void) { + TRACE_POSTGRESQL_MULTIXACT_CHECKPOINT_START(); + /* Flush dirty MultiXact pages to disk */ SimpleLruFlush(MultiXactOffsetCtl, true); SimpleLruFlush(MultiXactMemberCtl, true); @@ -1540,6 +1543,8 @@ CheckPointMultiXact(void) */ if (!InRecovery) TruncateMultiXact(); + + TRACE_POSTGRESQL_MULTIXACT_CHECKPOINT_DONE(); } /* Index: src/backend/access/transam/slru.c === RCS file: /projects/cvsroot/pgsql/src/backend/access/transam/slru.c,v retrieving revision
Re: [HACKERS] New DTrace probes proposal
Robert Treat wrote: While it would be nice to have a clean merge of the two, it's probably simple enough to just re-implement the differences into your patch (since yours already compiles on 8.4). Should be straightforward ... I can do the merge. As far as naming scheme, I'm not particularly wedded to either... is there a dtrace naming convention that could be followed? Yep, and the probes I submitted pretty much follow the suggested naming convention. -Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Overhauling GUCS
Robert Treat wrote: On Wednesday 04 June 2008 22:04:54 Greg Smith wrote: I was just talking to someone today about building a monitoring tool for this. Not having a clear way to recommend people monitor use of work_mem and its brother spilled to disk sorts is an issue right now, I'll whack that one myself if someone doesn't beat me to it before I get time. I remember *years* ago, someone wrote a perl script to poll pgsql_tmp and print out anytime something showed up... you could probably find that in the archives if you look around. of course to me this sounds like an excellent idea for a dtrace probe ;-) Actually, you can find out from the sort-end probe now whether or not the sort spilled to disk and number of disk blocks used. This is one of the probes from Simon. TRACE_POSTGRESQL_SORT_END(state->tapeset, (state->tapeset ? LogicalTapeSetBlocks(state->tapeset) : (state->allowedMem - state->availMem + 1023) / 1024)); -Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] New DTrace probes proposal
Robert Treat wrote: certainly by the time 8.4 ships, these should work with freebsd I'd think. ideally we would need to confirm this by release time, certainly getting a bsd buildfarm member to compile with them would be a start (and very unlikely to cause issues) As soon as the DTrace port is working on FreeBSD, I will confirm that the probes are working properly, and it's definitely a good idea to get a buildfarm machine building with --enable-dtrace. One thing I didnt understand after looking at this was... * Probes to measure query time query-parse-start (int, char *) I would have guessed that the arguments might be pid and query string, but looking at the probes, I see it defined as: TRACE_POSTGRESQL_QUERY_PARSE_START(query_string); which doesn't seem to match up... can you explain that piece? Having the pid passed as an argument was my original intention, but it's actually redundant since the pid is readily available from the script, so I will fix the other probes with pid as args. Overall, I like the probes you have breaking down query parsing/planning/executing, though I like ours for measuring autovacuum pieces, so I think the end game should be to just merge the two patches together (barring any place there is direct conflict)... do you see any issues with that? Yes, to avoid confusion, the probes should be merged and resubmitted as one patch. Have yours been ported to 8.4 yet? We also need to make sure the names and arg types are consistent, probably should work on this offline. One other questions would be what to do with the dtrace scripts. I think having a set of these available is a large boon for dtrace users, but do you see that as something that needs to be distriubuted with the core? I don't see the need to include the scripts with core now, maybe some point in the future if it makes sense. I'd lean towards reviving the dtrace project on pgfoundry, but it might be worth expanding the dynamic tracing chapter to include more examples and a pointer to pgfoundry. Agreed on both. I will add the new scripts to the dtrace project on PgFoundry and add more info to the doc. I think you guys have some interesting scripts as well that folks will find useful. -Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Overhauling GUCS
Tom Lane wrote: If those aren't enough questions, what else must we ask? Or maybe they aren't the right questions at all --- maybe we should ask "is this a dedicated machine or not" and try to extrapolate everything else from what we (hopefully) can find out about the hardware. I think probably a combination of high and low-level questions and make the low-level (more specific) questions optional since some users may not be able to provide low-level info. Here's a rough idea of how I envision this tool should work. $ pg_config_wizard Is this machine dedicated to Postgres? (y/n) n (now tool auto-discovers available HW resources) Your system has 32GB memory. What percentage do you want to allocate to Postgres? (1=50%, 2=33%, 3=25%, etc) 1 What type of workload? (OLTP, DSS, etc) ... At the end, the tool runs for a while checking to see if certain thresholds are reached to determine which parameters need to be increased. The tool would change the parameters causing the bottlenecks, rerun Postgres/workload, and iterate a few times until the results are satfisfactory. Write the recommended settings to postgresql.conf.recommend and let the user update postgresql.conf himself or whatever. I update my postgresql.conf, rerun the app and see 100% increased in throughput :-) -Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Overhauling GUCS
Tom Lane wrote: This is even assuming that the tool needs to edit the file itself, rather than just give advice. The advice is the hard part, folks; could we stop obsessing about trivia? +1. IMHO, the tool doesn't need to worry about generating a prettied version of postgresql.conf. It should just inform the user about the appropriate settings or create a postgresql.conf.recommend and have the user update postgresql.conf himself. -Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Overhauling GUCS
Steve Atkins wrote: I'd be interested in putting together a framework+GUI client to do this cleanly in a cross-platform (Windows, Linux, Solaris, OS X as a bare minimum) sort of way, if no-one else already has such a thing. This is a great idea, and I was thinking along the same line. The framework can provide generic interfaces for the GUI/Web client, and leave it up to the OS to provide the needed data. -Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] New DTrace probes proposal
Greg Smith wrote: There's also a big DTrace probe set patch available from OmniTI: https://labs.omniti.com/project-dtrace/trunk/postgresql/ http://labs.omniti.com/trac/project-dtrace/wiki/Applications#PostgreSQL I don't know if you've looked at that before. There's some overlap but many unique and handy probes to each set. I think it would be nice to consider a superset union of the two. I heard about Theo's probes recently, but I haven't had a chance to look at them closely. I'm more than happy to adapt his probes for 8.4 and remove any duplicates if there are no objections from Theo. I would guess OmniTI would be glad to have their set assimilated into core as well so they don't have to maintain their patch past 8.3; I'd think so too! -Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] New DTrace probes proposal
(Resending since it didn't work the first time. Not sure if attaching a tar file was the culprit.) I'd like to propose adding the following probes (some of which came from Simon) to 8.4. I think these probe provide very useful data. Although some of the data can be collected now, the main advantages with probes, among others, are (1) they are always available and can be enabled only when needed especially in production (2) different combinations of probes can be used together to collect interesting data. They work on OS X Leopard & Solaris now, and hopefully on FreeBSD soon. Preliminary patch attached along with sample DTrace scripts. -Robert --- * Probes to measure query time query-parse-start (int, char *) query-parse-done (int, char *) query-plan-start () query-plan-done () query-execute-start () query-execute-done () query-statement-start (int, char *) query-statement-done (int, char *) * Probes to measure dirty buffer writes by the backend because bgwriter is not effective dirty-buffer-write-start (int, int, int, int) dirty-buffer-write-done (int, int, int, int) * Probes to measure physical writes from the shared buffer buffer-write-start (int, int, int, int) buffer-write-done (int, int, int, int, int) * Probes to measure reads of a relation from a particular buffer block buffer-read-start (int, int, int, int, int) buffer-read-done (int, int, int, int, int, int) * Probes to measure the effectiveness of buffer caching buffer-hit () buffer-miss () * Probes to measure I/O time because wal_buffers is too small wal-buffer-write-start () wal-buffer-write-done () * Probes to measure checkpoint stats such as running time, buffers written, xlog files added, removed, recycled, etc checkpoint-start (int) checkpoint-done (int, int, int, int, int) * Probes to measure Idle in Transaction and client/network time idle-transaction-start (int, int) idle-transaction-done () * Probes to measure sort time sort-start (int, int, int, int, int) sort-done (int, long) * Probes to determine whether or not the deadlock detector has found a deadlock deadlock-found () deadlock-notfound (int) * Probes to measure reads/writes by block numbers and relations smgr-read-start (int, int, int, int) smgr-read-end (int, int, int, int, int, int) smgr-write-start (int, int, int, int) smgr-write-end (int, int, int, int, int, int) Index: backend/access/transam/xlog.c === RCS file: /projects/cvsroot/pgsql/src/backend/access/transam/xlog.c,v retrieving revision 1.295 diff -u -3 -p -r1.295 xlog.c --- backend/access/transam/xlog.c 25 Mar 2008 22:42:42 - 1.295 +++ backend/access/transam/xlog.c 16 May 2008 18:37:22 - @@ -50,6 +50,7 @@ #include "utils/builtins.h" #include "utils/pg_locale.h" #include "utils/ps_status.h" +#include "pg_trace.h" /* File path names (all relative to $PGDATA) */ @@ -1258,12 +1259,14 @@ AdvanceXLInsertBuffer(bool new_segment) * Have to write buffers while holding insert lock. This is * not good, so only write as much as we absolutely must. */ +TRACE_POSTGRESQL_WAL_BUFFER_WRITE_START(); WriteRqst.Write = OldPageRqstPtr; WriteRqst.Flush.xlogid = 0; WriteRqst.Flush.xrecoff = 0; XLogWrite(WriteRqst, false, false); LWLockRelease(WALWriteLock); Insert->LogwrtResult = LogwrtResult; +TRACE_POSTGRESQL_WAL_BUFFER_WRITE_DONE(); } } } @@ -5665,6 +5668,8 @@ CreateCheckPoint(int flags) MemSet(&CheckpointStats, 0, sizeof(CheckpointStats)); CheckpointStats.ckpt_start_t = GetCurrentTimestamp(); + TRACE_POSTGRESQL_CHECKPOINT_START(flags); + /* * Use a critical section to force system panic if we have trouble. */ @@ -5726,6 +5731,7 @@ CreateCheckPoint(int flags) LWLockRelease(WALInsertLock); LWLockRelease(CheckpointLock); END_CRIT_SECTION(); + TRACE_POSTGRESQL_CHECKPOINT_DONE(UINT_MAX,0,0,0,UINT_MAX); return; } } @@ -5945,6 +5951,11 @@ CreateCheckPoint(int flags) if (log_checkpoints) LogCheckpointEnd(); + TRACE_POSTGRESQL_CHECKPOINT_DONE(CheckpointStats.ckpt_bufs_written, +NBuffers, CheckpointStats.ckpt_segs_added, +CheckpointStats.ckpt_segs_removed, +CheckpointStats.ckpt_segs_recycled); + LWLockRelease(CheckpointLock); } Index: backend/storage/buffer/bufmgr.c === RCS file: /projects/cvsroot/pgsql/src/backend/storage/buffer/bufmgr.c,v retrieving revision 1.228 diff -u -3 -p -r1.228 bufmgr.c --- backend/storage/buffer/bufmgr.c 1 Jan 2008 19:45:51 - 1.228 +++ backend/storage/buffer/bufmgr.c 16 May 2008 18:37:22 - @@ -42,6 +42,7 @@ #include "storage/smgr.h" #include "utils/resowner.h" #include "pgstat.h" +#include "pg_trace.h" /* Note: these two macros only work on shared buffers, not local ones! */ @@ -171,6 +172,7 @@ ReadBuffer_common(Relation reln, BlockNu if (isExtend) blockNum = smgrnblocks(reln->rd_smgr);
Re: [HACKERS] [COMMITTERS] pgsql: Enable probes to work with Mac OS X Leopard and other OSes that
Peter Eisentraut wrote: Am Dienstag, 18. März 2008 schrieb Tom Lane: Well, I've got Leopard here, I'd be happy to test it ... but the patch has rendered http://developer.postgresql.org/pgdocs/postgres/dynamic-trace.html into a pack of lies quite independently of which OSes are supported, so I'm not very sure what to do. Ah yes. Robert, do you think you could update the documentation a bit on how to use the tracing? Yes, the doc needs to be updated. Will submit a patch soon! Regards, -Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [COMMITTERS] pgsql: Enable probes to work with Mac OS X Leopard and other OSes that
Peter Eisentraut wrote: Well, yes. I meant to say, a build system that can supply the functionality of Gen_fmgrtab can surely implement this new thing. I see there is Perl being used, so it should be simple. I was thinking of using a Perl script to generate the dummy header file but decided against it to avoid disrupting the build on other platforms. If sed doesn't work on Windows for some reason, we can use a Perl script instead. Regards, -Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Commitfest process
Tom Lane wrote: This is reasonable for the sort of medium-to-large patch that the author has put a lot of time into. But we also get a lot of small one-off patches where it's not so reasonable. Now of course many of those get applied right away, but not all. One of the services that Bruce's patch queue has always performed is making sure stuff like that doesn't fall through the cracks. I don't think a purely author-driven patch queue will work to ensure that. We could combine the ideas: encourage authors to use the wiki, but have someone (probably Bruce ;-)) in charge of adding stuff to the wiki if the author doesn't. Stefan Kaltenbrunner said a script can be written to insert an entry into the wiki automatically when a new patch comes in. That would be ideal and would avoid manual intervention. The author can still update the wiki as needed. http://archives.postgresql.org/pgsql-hackers/2008-02/msg00433.php Regards, -Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Commitfest process
Heikki Linnakangas wrote: The main point of my proposal is: let's make the *authors* who want their stuff to be reviewed as part of a commitfest do the extra work. There would be no extra work required for patch reviewers. I agree with Heikki that for the process to be successful, it should not impose extra work for the reviewers. The author is more motivated to get his/her patch in than the review to review the patch. Besides, I don't think it's too much to ask to have the author enter one line into a wiki and keep the status up to date for the duration of the commitfest. Sure, we can refine that later. making it easier for patch authors as well, but I don't think it's an unreasonable amount of work to keep one line per patch up-to-date in a wiki. The line doesn't need to contain anything else than title of patch, name of the author, and links to latest patch and relevant discussion threads, if applicable. And commitfests are short, you only need to update the wiki a couple of times during the commitfest. I think it's a good idea to try out the process with a wiki first, and if it works well, consider a more sophisticated tool if needed. How about use the following fields for the wiki page and have the author be responsible for keeping it up to date? Patch title Patch URL Discussion URL (optional) Author Reviewer (updated by reviewer or author) Patch Status (awaiting review -> reviewing -> ... -> accepted | rejected | whatever) Regards, -Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proposed changes to DTrace probe implementation
Tom Lane wrote: The concern I've got about this is basically that it would encourage plastering the same label on subtly different counts, leading to confusion and perhaps mistaken conclusions. I would prefer to see any common probes be reverse-engineered *after the fact*, ie, after you've already instrumented several DB's you're in a better position to figure out what's common and what's not. I distrust preconceived notions about that. Your point is well taken, and we can revisit this later! Regards, -Robert ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [HACKERS] Proposed changes to DTrace probe implementation
Gregory Stark wrote: I think both types of probes are useful to different people. I think certain higher level probes can be really useful to DBAs. Perhaps looking at the standard database SNMP MIB counters would give us a place to start for upward facing events people want to trace for databases in general. Great idea. I found this link for public RDBMS MIB http://www.mnlab.cs.depaul.edu/cgi-bin/sbrowser.cgi?HOST=&OID=RDBMS-MIB!rdbmsMIB The stats in rdbmsSrvInfoTable is quite useful, and it's one of the tables that Oracle implements in their SNMP support. http://download-east.oracle.com/docs/cd/B14099_19/manage.1012/b16244/appdx_d_rdbms.htm Regards, -Robert ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] Proposed changes to DTrace probe implementation
Tom Lane wrote: I'm unimpressed; it's not at all clear that you'd be measuring quite the same thing in, say, mysql as in postgres. I think it depends on the probe, but for transaction rates like start or commit, don't you think it's pretty black & white as long as the probes are placed in the correct locations. Possibly I have a different view of the uses of dtrace than you do, but most of the events I'd be interested in probing are probably pretty Postgres-specific. I agree that most probes, especially low level ones, will be Postgres specific. I think distinguishing half a dozen of them on the assumption that there should be (exact) matches to that probe point in most databases is misleading and a source of useless extra notation. This is just forward looking on my part since other OS databases will most likely be adding Dtrace probes as well. I'll put them back together in one provider for now! Regards, -Robert ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] Proposed changes to DTrace probe implementation
Tom Lane wrote: I'm unconvinced that there will be any probes that are common to all databases. I'd skip this part... Any reason why we can't consider probes like transaction-start, transaction-commit, or transaction-abort as common probes that can also be used in other (maybe no all) databases? We are only talking about the probe definition here as shown below, not how they will be implemented. probe transaction__start(int); probe transaction__commit(int); probe transaction__abort(int); Regards, -Robert ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] Proposed changes to DTrace probe implementation
Robert Lor wrote: Proposed changes: * Switch from using DTRACE_PROBEn macros to the dynamically generated macros (remove pg_trace.h) * Use "dtrace -h" to create a header file that contains the dynamically generated macros to be used in the source code instead of the DTRACE_PROBEn macros. * Create a new header file called probes_null.h to map the generated macros to no-ops * This is unrelated to making DTrace work on Leopard, but I'd like to propose that we split the probes into generic database and Postgres specific providers, called "database" and "postgresql" respectively. Other databases will be adding DTrace probes as well, and we want to use "database" as the provider for probes that are common to all databases. This way scripts that use the common provider will work on databases that implement the common probes. Since I haven't heard any objections, I assume the proposed changes are acceptable. I'll go a head and submit the patch. Regards, -Robert ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] Proposed changes to DTrace probe implementation
Robert Lor <[EMAIL PROTECTED]> wrote: > Peter, > > Peter Eisentraut <[EMAIL PROTECTED]> wrote: > > Could you please use diff -c or -u for the patch? It's quite hard > to > > read this way. > > Attached is the patch with diff -c. > Oops, here's the real attachment! Regards, -Robert dtrace.patch Description: Binary data ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] Proposed changes to DTrace probe implementation
Peter, Peter Eisentraut <[EMAIL PROTECTED]> wrote: > Could you please use diff -c or -u for the patch? It's quite hard to > read this way. Attached is the patch with diff -c. Thanks, -Robert ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
[HACKERS] Proposed changes to DTrace probe implementation
Motivation: To enable probes to work with Mac OS X Leopard and other OSes that will support DTrace in the future. Bug filed on this issue http://archives.postgresql.org/pgsql-bugs/2007-10/msg00201.php The problem: There are two different mechanisms to implementing application level probes in DTrace as explained here http://blogs.sun.com/ahl/entry/user_land_tracing_gets_better. I originally chose the first mechanism (using DTRACE_PROBEn macros) for simplicity, but Leopard's DTrace implementation only supports the second mechanism (using macros from dtrace generated header file). The second mechanism was introduced as an enhancement, and one of the benefits is type checking. Proposed changes: * Switch from using DTRACE_PROBEn macros to the dynamically generated macros (remove pg_trace.h) * Use "dtrace -h" to create a header file that contains the dynamically generated macros to be used in the source code instead of the DTRACE_PROBEn macros. * Create a new header file called probes_null.h to map the generated macros to no-ops * This is unrelated to making DTrace work on Leopard, but I'd like to propose that we split the probes into generic database and Postgres specific providers, called "database" and "postgresql" respectively. Other databases will be adding DTrace probes as well, and we want to use "database" as the provider for probes that are common to all databases. This way scripts that use the common provider will work on databases that implement the common probes. With the proposed changes, the steps for adding new probes are slightly different, but the ways the probes are used will not change. Steps for adding new probes now: 1) Add probe name to probes.d 2) Add probe to source code using one of PG_TRACEn macros 3) Recompile Steps for adding probes with proposed changes: 1) Add probe name to probes.d (probes.h will be created at build time) 2) Add null macros to probes_null.h 3) Add probe to source code using the dynamically generated macro from probes.h 4) Recompile Preliminary patch is attached. There is one known issue with the patch. When compiling outside of the source tree, probes.d needs to be symlinked to the source tree. I haven't figured out how to copy the file to the build tree yet. I'm sure this is trivial for you gurus out there! Comments? Regards, -Robert ? src/include/probes_null.h Index: src/Makefile === RCS file: /projects/cvsroot/pgsql/src/Makefile,v retrieving revision 1.42 diff -r1.42 Makefile 16a17,19 > ifeq ($(enable_dtrace), yes) > $(DTRACE) -h -s $(top_builddir)/src/backend/utils/probes.d -o > $(top_builddir)/src/include/probes.h > endif Index: src/backend/Makefile === RCS file: /projects/cvsroot/pgsql/src/backend/Makefile,v retrieving revision 1.125 diff -r1.125 Makefile 22a23 > ifeq ($(PORTNAME), solaris) 25a27 > endif 143a146 > ifeq ($(PORTNAME), solaris) 145a149 > endif Index: src/backend/access/transam/xact.c === RCS file: /projects/cvsroot/pgsql/src/backend/access/transam/xact.c,v retrieving revision 1.257 diff -r1.257 xact.c 1482c1482 < PG_TRACE1(transaction__start, vxid.localTransactionId); --- > DATABASE_TRANSACTION_START(vxid.localTransactionId); 1607c1607 < PG_TRACE1(transaction__commit, MyProc->lxid); --- > DATABASE_TRANSACTION_COMMIT(MyProc->lxid); 1993c1993 < PG_TRACE1(transaction__abort, MyProc->lxid); --- > DATABASE_TRANSACTION_ABORT(MyProc->lxid); Index: src/backend/storage/lmgr/lock.c === RCS file: /projects/cvsroot/pgsql/src/backend/storage/lmgr/lock.c,v retrieving revision 1.181 diff -r1.181 lock.c 790c790 < PG_TRACE2(lock__startwait, locktag->locktag_field2, lockmode); --- > POSTGRESQL_LOCK_STARTWAIT(locktag->locktag_field2, lockmode); 794c794 < PG_TRACE2(lock__endwait, locktag->locktag_field2, lockmode); --- > POSTGRESQL_LOCK_ENDWAIT(locktag->locktag_field2, lockmode); Index: src/backend/storage/lmgr/lwlock.c === RCS file: /projects/cvsroot/pgsql/src/backend/storage/lmgr/lwlock.c,v retrieving revision 1.50 diff -r1.50 lwlock.c 450c450 < PG_TRACE2(lwlock__startwait, lockid, mode); --- > POSTGRESQL_LWLOCK_STARTWAIT(lockid, mode); 461c461 < PG_TRACE2(lwlock__endwait, lockid, mode); --- > POSTGRESQL_LWLOCK_ENDWAIT(lockid, mode); 472c472 < PG_TRACE2(lwlock__acquire, lockid, mode); --- > POSTGRESQL_LWLOCK_ACQUIRE(lockid, mode); 543c543 < PG_TRACE2(lwlock__condacquire__fail, lockid, mode); --- > POSTGRESQL_LWLOCK_CONDACQUIRE_FAIL(lockid, mode); 549c549 < PG_TRACE2(lwlock__condacquire, lockid, mode); ---
Re: [HACKERS] EXPLAIN ANALYZE printing logical and hardware I/O per-node
Greg, Gregory Stark wrote: I don't think DTrace is overkill either. The programmatic interface is undocumented (but I've gotten Sun people to admit it exists -- I just have to reverse engineer it from the existing code samples) but should be more or less exactly what we need. You probably know this already. There are existing commands that use the programmatic interface and would provide a good starting point. Here are a couple: http://cvs.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/cmd/lockstat/lockstat.c http://cvs.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/cmd/dtrace/dtrace.c One of my colleagues is in the process of putting a tutorial together for how to do this, so if you decided to pursue this approach and need assistance, please let me know. Regards, Robert ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] [PATCHES] Generic Monitoring Framework with DTrace patch
Peter Eisentraut wrote: Robert Treat wrote: Also should installation.sgml mention the issueswith building 32 vs 64 bit binaries I'm not convinced there is an issue. dtrace will build the right binaries by default. If you're messing with mixed environments *and* delve into dtrace, you should probably be able to figure this out yourself. None that I'm aware of. and/or the issue with static functions? The issue with that is simply that there is no released operating system version for which the dtrace support works. I'm not sure what to do about that. This is a very temporary issue, and it will just require PostgreSQL to be built on the lastest version of Solaris (e.g Solaris Express), but the binary can will run just fine on the FCS version (e.g. Solaris 10). This will be clarified in the doc patch. -Robert ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] [PATCHES] Generic Monitoring Framework with DTrace patch
Sorry for the delayed response. Robert Treat wrote: Looking through -patches I don't see the doc patch, and outside of installation.sgml there doesn't seem to be anything either. Robert, are you still on the hook for these? Josh will help submit the doc patch. I have documented the usage instructions in a couple of places but just have been too busy to get the patch submitted. My bad. http://pgfoundry.org/docman/?group_id=1000163 http://blogs.sun.com/robertlor Also should installation.sgml mention the issues with building 32 vs 64 bit binaries and/or the issue with static functions? There are no issues with building 32 and 64 bit binaries. The above doc explains the issue with static function. Regards, -Robert ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] LWLock statistics collector
Tom Lane wrote: I think the actual wave of the future for analyzing behavior at the LWLock level is going to be DTrace. It seems way more flexible than an aggregate-statistics view can be. CVS head now has the following LWLock probes, and more can easily be added. These probes can be enabled using the sample DTrace scripts at http://pgfoundry.org/projects/dtrace/ lwlock-acquire lwlock-release lwlock-startwait lwlock-endwait lwlock-condacquire lwlock-condacquire-fail If anyone wants to have access to a Solaris machine to play with DTrace, let me know. Regards, -Robert ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] DTrace enabled build fails
Bruce Momjian wrote: Do we need to add detection logic to catch buggy versions? Instead of adding extra logic, I think it's sufficient with documentation since the issue will soon be fixed in the next Solaris update. Regards, -Robert ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] DTrace enabled build fails
Peter Eisentraut wrote: That rings a bell. Can we get a more precise designation on what version of DTrace we support? And where can one get that required update? Peter, The problem with static function was fixed recently and is now available in Solaris Express (the development version of Solaris). You can get the bits from http://www.sun.com/software/solaris/solaris-express/get.jsp. I forgot to mention this know issue in my previous emails! I was told by the DTrace engineer that this fix will be in the next update of Solaris 10. Regards, -Robert ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] [PATCHES] Generic Monitoring Framework with DTrace patch
Excellent! I'll submit a doc patch shortly. Regards, -Robert Peter Eisentraut wrote: I've committed the dtrace patch. Some documentation would be nice now ... ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] [PERFORM] Sun Donated a Sun Fire T2000 to the PostgreSQL
Tom Lane wrote: Hmmm ... AFAICS this must mean that flushing the WAL data to disk at transaction commit time takes (most of) 20 msec on your hardware. Which still seems high --- on most modern disks that'd be at least two disk revolutions, maybe more. What's the disk hardware you're testing on, particularly its RPM spec? I actually ran the test on my laptop. It has an Ultra ATA/100 drive (5400 rpm). The test was just a quickie to show some data from the probes. I'll collect and share data from the T2000 server later. Regards, -Robert ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] [PATCHES] Generic Monitoring Framework with DTrace patch
Peter Eisentraut wrote: As I understand this, the probe file is compiled into some sort of object file which is linked into the binary. Correct. So if we ever have probes in other components, we'd probably want to have separate probe source and object files for them. That would seem better than one big probe file that is linked in everywhere. We agreed that there would only be one provider called postgresql, and I believe (need to double check) all the probes have to be defined in the same provider in the same file. What you suggest sounds like a good way to separate probes from different components. Regards, -Robert ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] [PERFORM] Sun Donated a Sun Fire T2000 to the PostgreSQL
Tom Lane wrote: Yeah, those seem plausible, although the hold time for CheckpointStartLock seems awfully high --- about 20 msec per transaction. Are you using a nonzero commit_delay? I didn't change commit_delay which defaults to zero. Regards, -Robert ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] [PATCHES] Generic Monitoring Framework with DTrace patch
Peter Eisentraut wrote: Robert Lor wrote: Now I'm getting a different type of error. After running the patch command, the configure file is partially patched but not the others. Attached is configure.rej. I just checked out the source from CVS. Sorry, there must be something wrong with your source code or patch tools, or the patch is mangled by email. I'm not sure. I was able to apply the patch after splitting it into individual files, ran dos2unix on each, and removed the diff line. I believe the problem is with the patch tool. Anyways, I tested the patch with --enable-dtrace and/or DTRACEFLAGS, and it works like a charm! If DTRACEFLAGS is not use, dtrace will default to 32 bit which is what we want. When building a 64 bit binary with --enable-dtrace, DTRACEFLAGS needs to be set to '-64'. I will submit a doc patch to explain all of the above. There is one minor issue - "gmake clean" doesn't remove probe.o. Regarding where to put probe.d, will src/probe.d work since probes for all subsystems will go into this one file. Thanks a bunch Peter! Regards, -Robert ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] [PATCHES] Generic Monitoring Framework with DTrace patch
Peter Eisentraut wrote: Perhaps the attached patch in -c format will work better. Now I'm getting a different type of error. After running the patch command, the configure file is partially patched but not the others. Attached is configure.rej. I just checked out the source from CVS. -bash-3.00$ patch -p0 < /var/tmp/dtrace-patch-c.diff Looks like a context diff to me... Hunk #1 failed at line 314. Hunk #13 failed at line 19. 2 out of 33 hunks failed: saving rejects to ./configure.rej done Regards, -Robert *** *** 314,320 # include #endif" ! ac_subst_vars='SHELL PATH_SEPARATOR PACKAGE_NAME PACKAGE_TARNAME PACKAGE_VERSION PACKAGE_STRING PACKAGE_BUGREPORT exec_prefix prefix program_transform_name bindir sbindir libexecdir datadir sysconfdir sharedstatedir localstatedir libdir includedir oldincludedir infodir mandir build_alias host_alias target_alias DEFS ECHO_C ECHO_N ECHO_T LIBS configure_args build build_cpu build_vendor build_os host host_cpu host_vendor host_os PORTNAME docdir enable_nls WANTED_LANGUAGES default_port enable_shared enable_rpath enable_debug CC CFLAGS LDFLAGS CPPFLAGS ac_ct_CC EXEEXT OBJEXT CPP GCC TAS autodepend INCLUDES enable_thread_safety with_tcl with_perl with_python with_krb5 krb_srvtab with_pam with_ldap with_bonjour with_openssl with_zlib EGREP ELF_SYS LDFLAGS_SL AWK FLEX FLEXFLAGS LN_S LD with_gnu_ld ld_R_works RANLIB ac_ct_RANLIB TAR STRIP ac_ct_STRIP STRIP_STATIC_LIB STRIP_SHARED_LIB YACC YFLAGS PERL perl_archlibexp perl_privlibexp perl_useshrplib perl_embed_ldflags PYTHON python_version python_configdir python_includespec python_libdir python_libspec python_additional_libs HAVE_IPV6 LIBOBJS acx_pthread_config PTHREAD_CC PTHREAD_LIBS PTHREAD_CFLAGS HAVE_POSIX_SIGNALS MSGFMT MSGMERGE XGETTEXT localedir TCLSH TCL_CONFIG_SH TCL_INCLUDE_SPEC TCL_LIB_FILE TCL_LIBS TCL_LIB_SPEC TCL_SHARED_BUILD TCL_SHLIB_LD_LIBS NSGMLS JADE have_docbook DOCBOOKSTYLE COLLATEINDEX SGMLSPL vpath_build LTLIBOBJS' ac_subst_files='' # Initialize some variables set by options. --- 314,320 # include #endif" ! ac_subst_vars='SHELL PATH_SEPARATOR PACKAGE_NAME PACKAGE_TARNAME PACKAGE_VERSION PACKAGE_STRING PACKAGE_BUGREPORT exec_prefix prefix program_transform_name bindir sbindir libexecdir datadir sysconfdir sharedstatedir localstatedir libdir includedir oldincludedir infodir mandir build_alias host_alias target_alias DEFS ECHO_C ECHO_N ECHO_T LIBS configure_args build build_cpu build_vendor build_os host host_cpu host_vendor host_os PORTNAME docdir enable_nls WANTED_LANGUAGES default_port enable_shared enable_rpath enable_debug DTRACE DTRACEFLAGS enable_dtrace CC CFLAGS LDFLAGS CPPFLAGS ac_ct_CC EXEEXT OBJEXT CPP GCC TAS autodepend INCLUDES enable_thread_safety with_tcl with_perl with_python with_krb5 krb_srvtab with_pam with_ldap with_bonjour with_openssl with_zlib EGREP ELF_SYS LDFLAGS_SL AWK FLEX FLEXFLAGS LN_S LD with_gnu_ld ld_R_works RANLIB ac_ct_RANLIB TAR STRIP ac_ct_STRIP STRIP_STATIC_LIB STRIP_SHARED_LIB YACC YFLAGS PERL perl_archlibexp perl_privlibexp perl_useshrplib perl_embed_ldflags PYTHON python_version python_configdir python_includespec python_libdir python_libspec python_additional_libs HAVE_IPV6 LIBOBJS acx_pthread_config PTHREAD_CC PTHREAD_LIBS PTHREAD_CFLAGS HAVE_POSIX_SIGNALS MSGFMT MSGMERGE XGETTEXT localedir TCLSH TCL_CONFIG_SH TCL_INCLUDE_SPEC TCL_LIB_FILE TCL_LIBS TCL_LIB_SPEC TCL_SHARED_BUILD TCL_SHLIB_LD_LIBS NSGMLS JADE have_docbook DOCBOOKSTYLE COLLATEINDEX SGMLSPL vpath_build LTLIBOBJS' ac_subst_files='' # Initialize some variables set by options. *** *** 19,25 SUBSYSOBJS := $(DIRS:%=%/SUBSYS.o) ! OBJS := $(SUBSYSOBJS) $(top_builddir)/src/port/libpgport_srv.a # We put libpgport into OBJS, so remove it from LIBS LIBS := $(filter-out -lpgport, $(LIBS)) --- 19,29 SUBSYSOBJS := $(DIRS:%=%/SUBSYS.o) ! OBJS = $(SUBSYSOBJS) $(top_builddir)/src/port/libpgport_srv.a ! ! ifeq ($(enable_dtrace), yes) ! OBJS += probes.o ! endif # We put libpgport into OBJS, so remove it from LIBS LIBS := $(filter-out -lpgport, $(LIBS)) ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] [PATCHES] Generic Monitoring Framework with DTrace patch
Peter Eisentraut wrote: Here is a consolidated patch that contains all the files. I made some configure and makefile adjustments and put standard comment headers in all the files. You can use DTRACEFLAGS to pass options to configure, which should help sorting out the 32/64-bit issue. The problem of the *.d files is already gone in CVS. Since I don't have access to a Solaris system, this is untested for the DTrace-enabled case. The only thing left to do besides actually testing that case would be moving the probes.d file to a different location, since we probably don't want to have special-purpose files in src/backend. I can't seem to apply the patch on Solaris. I ran the patch command from the top of the src tree and got the following messages. When I tried to enter a file name, it goes into infinite loop. What did I do wrong? -bash-3.00$ patch -p0 -u < /var/tmp/dtrace-patch.diff Looks like a unified context diff. The next patch looks like a unified context diff. The next patch looks like a unified context diff. The next patch looks like a unified context diff. The next patch looks like a unified context diff. File to patch: Regards, -Robert ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] [PATCHES] Generic Monitoring Framework with DTrace patch
Peter, I'll test the patch on Solaris. Thanks! Regards, -Robert Peter Eisentraut wrote: Here is a consolidated patch that contains all the files. I made some configure and makefile adjustments and put standard comment headers in all the files. You can use DTRACEFLAGS to pass options to configure, which should help sorting out the 32/64-bit issue. The problem of the *.d files is already gone in CVS. Since I don't have access to a Solaris system, this is untested for the DTrace-enabled case. The only thing left to do besides actually testing that case would be moving the probes.d file to a different location, since we probably don't want to have special-purpose files in src/backend. diff -uNr ../cvs-pgsql/configure ./configure --- ../cvs-pgsql/configure 2006-07-21 23:35:48.0 +0200 +++ ./configure 2006-07-22 01:21:54.0 +0200 @@ -314,7 +314,7 @@ # include #endif" -ac_subst_vars='SHELL PATH_SEPARATOR PACKAGE_NAME PACKAGE_TARNAME PACKAGE_VERSION PACKAGE_STRING PACKAGE_BUGREPORT exec_prefix prefix program_transform_name bindir sbindir libexecdir datadir sysconfdir sharedstatedir localstatedir libdir includedir oldincludedir infodir mandir build_alias host_alias target_alias DEFS ECHO_C ECHO_N ECHO_T LIBS configure_args build build_cpu build_vendor build_os host host_cpu host_vendor host_os PORTNAME docdir enable_nls WANTED_LANGUAGES default_port enable_shared enable_rpath enable_debug CC CFLAGS LDFLAGS CPPFLAGS ac_ct_CC EXEEXT OBJEXT CPP GCC TAS autodepend INCLUDES enable_thread_safety with_tcl with_perl with_python with_krb5 krb_srvtab with_pam with_ldap with_bonjour with_openssl with_zlib EGREP ELF_SYS LDFLAGS_SL AWK FLEX FLEXFLAGS LN_S LD with_gnu_ld ld_R_works RANLIB ac_ct_RANLIB TAR STRIP ac_ct_STRIP STRIP_STATIC_LIB STRIP_SHARED_LIB YACC YFLAGS PERL perl_archlibexp perl_privlibexp perl_useshrplib perl_embed_ldflags PYTHON python_version python_configdir python_includespec python_libdir python_libspec python_additional_libs HAVE_IPV6 LIBOBJS acx_pthread_config PTHREAD_CC PTHREAD_LIBS PTHREAD_CFLAGS HAVE_POSIX_SIGNALS MSGFMT MSGMERGE XGETTEXT localedir TCLSH TCL_CONFIG_SH TCL_INCLUDE_SPEC TCL_LIB_FILE TCL_LIBS TCL_LIB_SPEC TCL_SHARED_BUILD TCL_SHLIB_LD_LIBS NSGMLS JADE have_docbook DOCBOOKSTYLE COLLATEINDEX SGMLSPL vpath_build LTLIBOBJS' +ac_subst_vars='SHELL PATH_SEPARATOR PACKAGE_NAME PACKAGE_TARNAME PACKAGE_VERSION PACKAGE_STRING PACKAGE_BUGREPORT exec_prefix prefix program_transform_name bindir sbindir libexecdir datadir sysconfdir sharedstatedir localstatedir libdir includedir oldincludedir infodir mandir build_alias host_alias target_alias DEFS ECHO_C ECHO_N ECHO_T LIBS configure_args build build_cpu build_vendor build_os host host_cpu host_vendor host_os PORTNAME docdir enable_nls WANTED_LANGUAGES default_port enable_shared enable_rpath enable_debug DTRACE DTRACEFLAGS enable_dtrace CC CFLAGS LDFLAGS CPPFLAGS ac_ct_CC EXEEXT OBJEXT CPP GCC TAS autodepend INCLUDES enable_thread_safety with_tcl with_perl with_python with_krb5 krb_srvtab with_pam with_ldap with_bonjour with_openssl with_zlib EGREP ELF_SYS LDFLAGS_SL AWK FLEX FLEXFLAGS LN_S LD with_gnu_ld ld_R_works RANLIB ac_ct_RANLIB TAR STRIP ac_ct_STRIP STRIP_STATIC_LIB STRIP_SHARED_LIB YACC YFLAGS PERL perl_archlibexp perl_privlibexp perl_useshrplib perl_embed_ldflags PYTHON python_version python_configdir python_includespec python_libdir python_libspec python_additional_libs HAVE_IPV6 LIBOBJS acx_pthread_config PTHREAD_CC PTHREAD_LIBS PTHREAD_CFLAGS HAVE_POSIX_SIGNALS MSGFMT MSGMERGE XGETTEXT localedir TCLSH TCL_CONFIG_SH TCL_INCLUDE_SPEC TCL_LIB_FILE TCL_LIBS TCL_LIB_SPEC TCL_SHARED_BUILD TCL_SHLIB_LD_LIBS NSGMLS JADE have_docbook DOCBOOKSTYLE COLLATEINDEX SGMLSPL vpath_build LTLIBOBJS' ac_subst_files='' # Initialize some variables set by options. @@ -865,6 +865,7 @@ --disable-rpath do not embed shared library search path in executables --disable-spinlocks do not use spinlocks --enable-debug build with debugging symbols (-g) + --enable-dtrace build with DTrace support --enable-depend turn on automatic dependency tracking --enable-cassertenable assertion checks (for debugging) --enable-thread-safety make client libraries thread-safe @@ -1947,6 +1948,82 @@ # +# DTrace +# + + + +# Check whether --enable-dtrace or --disable-dtrace was given. +if test "${enable_dtrace+set}" = set; then + enableval="$enable_dtrace" + + case $enableval in +yes) + +cat >>confdefs.h <<\_ACEOF +#define ENABLE_DTRACE 1 +_ACEOF + +for ac_prog in dtrace +do + # Extract the first word of "$ac_prog", so it can be a program name with args. +set dummy $ac_prog; ac_word=$2 +echo "$as_me:$LINENO: checking for $ac_word" >&5 +echo $ECHO_N "checking for $ac_word... $ECHO_C" >&6 +if test "${ac_cv_prog_DTRACE+set}" = set; then + echo $ECHO_N "(cached) $ECHO_C" >&6
Re: [HACKERS] [PATCHES] Generic Monitoring Framework with DTrace patch
Peter Eisentraut wrote: Robert Lor wrote: The user needs to have the flexibility to build a 32 bit PG binary even when he run the 64 bit kernel. If I understand you correctly, your suggestion will not allow a 32 bit binary to be built on a 64 bit OS. I'm not sure about the context. How do you control whether the PostgreSQL binaries you are about to build end up 32 bit or 64 bit? Presumably there is some default, and you switch it using CFLAGS or LDFLAGS. To build 64 bit binary, I use the following flag depending on the compiler. Without -m64 or -xtarget=native64, it defaults to 32 bit. CC='gcc -m64' CC='//cc -xtarget=native64' Then it would make sense to let dtrace be controled by DTRACE_FLAGS or some such. But what does dtrace do if no flag at all is given? We want to be able to set DTRACEFLAGS to 32 or 64 (e.g. DTRACEFLAGS='64'). If DTRACEFLAGS is not set, can we provide a default value to 32? Otherwise, the compile will fail. It's also possible that the CC and DTRACEFLAGS are in conflict, and in that case the compile will also fail, which is probably okay. Regards, -Robert ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] [PERFORM] Sun Donated a Sun Fire T2000 to the PostgreSQL
Tom Lane wrote: Also, it'd be interesting to count time spent holding shared lock separately from time spent holding exclusive. Tom, Here is the break down between exclusive & shared LWLocks. Do the numbers look reasonable to you? Regards, -Robert bash-3.00# time ./Tom_lwlock_acquire.d `pgrep -n postgres` ** LWLock Count: Exclusive ** Lock IdMode Count ControlFileLock Exclusive 1 FreeSpaceLock Exclusive 9 XidGenLock Exclusive 202 CLogControlLock Exclusive 203 WALWriteLock Exclusive 203 BgWriterCommLock Exclusive 222 BufFreelistLock Exclusive 305 BufMappingLock Exclusive 305 ProcArrayLock Exclusive 405 FirstLockMgrLock Exclusive 670 WALInsertLock Exclusive1616 ** LWLock Count: Shared ** Lock IdMode Count CheckpointStartLock Shared 202 CLogControlLock Shared 450 SubtransControlLock Shared 776 XidGenLock Shared2020 ProcArrayLock Shared3778 SInvalLock Shared4040 BufMappingLock Shared 40838 ** LWLock Time: Exclusive ** Lock Id Combined Time (ns) ControlFileLock 8301 FreeSpaceLock80590 CLogControlLock 1603557 BgWriterCommLock 1607122 BufFreelistLock 1997406 XidGenLock 2312442 BufMappingLock 3161683 FirstLockMgrLock 5392575 ProcArrayLock 6034396 WALInsertLock 12277693 WALWriteLock324869744 ** LWLock Time: Shared ** Lock Id Combined Time (ns) CLogControlLock 3183788 SubtransControlLock 6956229 XidGenLock 12012576 SInvalLock 35567976 ProcArrayLock 45400779 BufMappingLock300669441 CheckpointStartLock 4056134243 real0m24.718s user0m0.382s sys 0m0.181s ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] [PATCHES] Generic Monitoring Framework with DTrace
korry wrote: How about the obvious DTRACE( ) or some similar variant? The idea is to keep the macro name generic since it can be mapped to other tracing facility on other platforms. Regards, -Robert ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] [PERFORM] Sun Donated a Sun Fire T2000 to the PostgreSQL
Tom Lane wrote: Those numbers look a bit suspicious --- I'd expect to see some of the LWLocks being taken in both shared and exclusive modes, but you don't show any such cases. You sure your script is counting correctly? I'll double check to make sure no stupid mistakes were made! Also, it'd be interesting to count time spent holding shared lock separately from time spent holding exclusive. Will provide that data later today. Regards, -Robert ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] [PATCHES] Generic Monitoring Framework with DTrace patch
Peter Eisentraut wrote: I would prefer to drop the PG_ prefixes on PG_TRACE and pg_trace.h. We know which software we're dealing with. I also agree with Martin & Tom to keep the PG_ prefixes. We should probably move the probes file to a subdirectory. Anyone know a good place? Also, again, the pgsql prefix should be dropped. To keep it consistent with the header file, perhaps it can be renamed to pg_probes.d Certainly doable, but will that be more reliable? Can't we convince dtrace to create binaries for the host platform by default? The user needs to have the flexibility to build a 32 bit PG binary even when he run the 64 bit kernel. If I understand you correctly, your suggestion will not allow a 32 bit binary to be built on a 64 bit OS. 3) When using --enable-depend, "gmake clean" removes all *.d files, I'm working on renaming the dependency files. Excellent! Thanks Peter for your help! Regards, -Robert ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] [PERFORM] Sun Donated a Sun Fire T2000 to the PostgreSQL
Tom Lane wrote: Tatsuo Ishii <[EMAIL PROTECTED]> writes: 18% in s_lock is definitely bad :-(. Were you able to determine which LWLock(s) are accounting for the contention? Sorry for the delay. Finally I got the oprofile data. It's huge(34MB). If you are interested, I can put somewhere. Please let me know. I finally got a chance to look at this, and it seems clear that all the traffic is on the BufMappingLock. This is essentially the same problem we were discussing with respect to Gavin Hamill's report of poor performance on an 8-way IBM PPC64 box (see hackers archives around 2006-04-21). If your database is fully cached in shared buffers, then you can do a whole lot of buffer accesses per unit time, and even though all the BufMappingLock acquisitions are in shared-LWLock mode, the LWLock's spinlock ends up being heavily contended on an SMP box. It's likely that CVS HEAD would show somewhat better performance because of the btree change to cache local copies of index metapages (which eliminates a fair fraction of buffer accesses, at least in Gavin's test case). Getting much further than that seems to require partitioning the buffer mapping table. The last discussion stalled on my concerns about unpredictable shared memory usage, but I have some ideas on that which I'll post separately. In the meantime, thanks for sending along the oprofile data! regards, tom lane I ran pgbench and fired up a DTrace script using the lwlock probes we've added, and it looks like BufMappingLock is the most contended lock, but CheckpointStartLocks are held for longer duration! Lock IdMode Count ControlFileLock Exclusive 1 SubtransControlLock Exclusive 1 BgWriterCommLock Exclusive 6 FreeSpaceLock Exclusive 6 FirstLockMgrLock Exclusive 48 BufFreelistLock Exclusive 74 BufMappingLock Exclusive 74 CLogControlLock Exclusive 184 XidGenLock Exclusive 184 CheckpointStartLock Shared 185 WALWriteLock Exclusive 185 ProcArrayLock Exclusive 368 CLogControlLock Shared 552 SubtransControlLock Shared1273 WALInsertLock Exclusive1476 XidGenLock Shared1842 ProcArrayLock Shared3160 SInvalLock Shared3684 BufMappingLock Shared 14578 Lock Id Combined Time (ns) ControlFileLock 7915 BgWriterCommLock43438 FreeSpaceLock 39 BufFreelistLock 448530 FirstLockMgrLock 2879957 CLogControlLock 4237750 SubtransControlLock 6378042 XidGenLock 9500422 WALInsertLock 16372040 SInvalLock 23284554 ProcArrayLock 32188638 BufMappingLock113128512 WALWriteLock142391501 CheckpointStartLock 4171106665 Regards, -Robert ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] Generic Monitoring Framework Proposal
Greg Stark wrote: It seems pointless to me to expose things like lwlock_acuire that map 1-1 to C function calls like LWLockAcquire. They're useless except to people who understand what's going on and if people know the low level implementation details of Postgres they can already trace those calls with dtrace without any help. lwlock_acquire is just an example. I think once we decided to down this path, we can solicit ideas for interesting probes and put them up for discussion on this alias whether or not they are needed. I think we need to have two categories of probes for admins and developers. Perhaps the probes for admins are more important since, as you said, the developers already know which function does what, but I think the low-level probes are still useful for new developers as there behavior will be documented. What would be useful is instrumenting high level calls that can't be traced without application guidance. For example, inserting a dtrace probe for each SQL and each plan node. That way someone could get the same info as EXPLAIN ANALYZE from a production server without having to make application modifications (or suffer the gettimeofday overhead). It's one thing to know "I seem to be acquiring a lot of locks" or "i'm spending all my time in sorting". It's another to be able to ask dtrace "what query am I running when doing all this sorting?" or "what kind of plan node am I running when I'm acquiring all these locks?" Completely agree. Regards, Robert ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] Generic Monitoring Framework Proposal
Simon Riggs wrote: This needs to work on Linux and Windows, minimum, also. The proposed solution will work on Linux & Windows if they similar facility that the macros can map to. Otherwise, the macros stay as no-ops and will not affect those platforms at all. It's obviously impossible to move a production system to a different OS just to use a cool tracing tool. So the architecture must intelligently handle the needs of multiple OS - even if the underlying facilities on them do not yet provide what we'd like. So I'm OK with Solaris being the best, just as long as its not the only one that benefits. The way it's proposed now, any OS can use the same interfaces and map to their underlying facilities. Does it look reasonable? Regards, Robert ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] Generic Monitoring Framework Proposal
Theo Schlossnagle wrote: Heh. Syscall probes and FBT probes in Dtrace have zero overhead. User-space probes do have overhead, but it is only a few instructions (two I think). Besically, the probe points are replaced by illegal instructions and the kernel infrastructure for Dtrace will fasttrap the ops and then act. So, it is tiny tiny overhead. Little enough that it isn't unreasonable to instrument things like s_lock which are tiny. Theo, you're a genius. FBT (funciton boundary tracing) probes have zero overhead (section 4.1) and user-space probes has two instructions over head (section 4.2). I was incorrect about making a general zero overhead statement. But it's so close to zero :-) http://www.sun.com/bigadmin/content/dtrace/dtrace_usenix.pdf The reason that Robert proposes user-space probes (I assume) is that tracing C functions can be too granular and not conveniently expose the "right" information to make tracing useful. Yes, I'm proposing user-space probes (aka User Statically-Defined Tracing - USDT). USDT provides a high-level abstraction so the application can expose well defined probes without the user having to know the detailed implementation. For example, instead of having to know the function LWLockAcquire(), a well documented probe called lwlock_acquire with the appropriate args is much more usable. Regards, Robert ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] Generic Monitoring Framework Proposal
Tom Lane wrote: Robert Lor <[EMAIL PROTECTED]> writes: The main goal for this Generic Monitoring Framework is to provide a common interface for adding instrumentation points or probes to Postgres so its behavior can be easily observed by developers and administrators even in production systems. What is the overhead of a "probe" when you're not using it? The answer had better not include the phrase "kernel call", or this is unlikely to pass muster... Here's what the DTrace developers have to say in their Usenix paper. "When not explicitly enabled, DTrace has zero probe effect - the system operates exactly as if DTrace were not present at all." http://www.sun.com/bigadmin/content/dtrace/dtrace_usenix.pdf The technical details are beyond me, so I can't tell you exactly what happens internally. I can find out if you're interested! For DTrace, probes can be enabled using a D script. When the probes are not enabled, there is absolutely no performance hit whatsoever. If you believe that, I have a bridge in Brooklyn you might be interested in. What are the criteria going to be for where to put probe calls? If it has to be hard-wired into the source code, I foresee a lot of contention about which probes are worth their overhead, because we'll need one-size-fits-all answers. I think we need to be selective in terms of which probes to add since we don't want to scatter them all over the source files. For DTrace, the overhead is very minimal, but you're right, other implementation for the same probe may have more perf overhead. arg1..arg5 = Any args to pass to the probe such as txn id, lock id, etc Where is the data type of a probe argument defined? It's in a .d file which looks like below: provider pg_backend { probe fsync__start(void); probe fsync__end(void); probe lwlock__acquire (int, int); probe lwlock__release(int); ... } Regards, Robert ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
[HACKERS] Generic Monitoring Framework Proposal
Motivation: -- The main goal for this Generic Monitoring Framework is to provide a common interface for adding instrumentation points or probes to Postgres so its behavior can be easily observed by developers and administrators even in production systems. This framework will allow Postgres to use the appropriate monitoring/tracing facility provided by each OS. For example, Solaris and FreeBSD will use DTrace, and other OSes can use their respective tool. What is DTrace? -- Some of you may have heard about or used DTrace already. In a nutshell, DTrace is a comprehensive dynamic tracing facility that is built into Solaris and FreeBSD (mostly working) that can be used by administrators and developers on live production systems to examine the behavior of both user programs and of the operating system. DTrace can help answer difficult questions about the OS and the application itself. For example, you may want to ask: - Show all functions that get invoked (userland & kernel) and execution time when my function foo() is called. Seeing the path a function takes into the kernel may provide clues for performance tuning. - Show how many times a particular lock is acquired and how long it's held. This can help identity contentions in the system. The best way to appreciate DTrace capabilities is by seeing a demo or through hands-on experience, and I plan to show some interesting demos at the PG Summit. There are a numer of docs on Dtrace, and here's a quick start doc and a complete reference guide. http://www.sun.com/software/solaris/howtoguides/dtracehowto.jsp http://docs.sun.com/app/docs/doc/817-6223 Here is a recent DTrace for FreeBSD status http://marc.theaimsgroup.com/?l=freebsd-current&m=114854018213275&w=2 Open source apps that provide user level probes (bottom of page) http://uadmin.blogspot.com/2006/05/what-is-dtrace.html Proposed Solution: This solution is actually quite simple and non-intrusive. 1. Define macros PG_TRACE, PG_TRACE1, etc, in a new header file called pg_trace.h with multiple #if defined(xxx) sections for Solaris, FreeBSD, Linux, etc, and add pg_trace.h to c.h which is included in postgres.h and included by every C file. The macros will have the following format: PG_TRACE[n](module_name, probe_name [, arg1, ..., arg5]) module_name = Name to identify PG module such as pg_backend, pg_psql, pg_plpgsql, etc probe_name = Probe name such as transaction_start, lwlock_acquire, etc arg1..arg5 = Any args to pass to the probe such as txn id, lock id, etc 2. Map PG_TRACE, PG_TRACE1, etc, to macros or functions appropriate for each OS. For OSes that don't have suitable tracing facility, just map the macros to nothing - doing this will not have any affect on performance or existing behavior. Sample of pg_trace.h #if defined(sun) || defined(FreeBSD) #include #define PG_TRACEDTRACE_PROBE #define PG_TRACE1 DTRACE_PROBE1 ... #define PG_TRACE5 DTRACE_PROBE5 #elif defined(__linux__) || defined(_AIX) || defined(__sgi) ... /* Map the macros to no-ops */ #define PG_TRACE(module, name) #define PG_TRACE1(module, name, arg1) ... #define PG_TRACE5(module, name, arg1, arg2, arg3, arg4, arg5) #endif 3. Add any file(s) to support the particular OS tracing facility 4. Update the Makefiles as necessary for each OS How to add probes: - To add a probe, just add a one line macro in the appropriate location in the source. Here's an example of two probes, one with no argument and the other with 2 arguments: PG_TRACE (pg_backend, fsync_start); PG_TRACE2 (pg_backend, lwlock_acquire, lockid, mode); If there are enough probes embedded in PG, its behavior can be easily observed. With the help of Gavin Sherry, we have added about 20 probes, and Gavin has suggested a number of other interesting areas for additional probes. Pervasive has also added some probes to PG 8.0.4 and posted the patch on http://pgfoundry.org/projects/dtrace/. I hope to combine the probes using this generic framework for 8.1.4, and make it available for folks to try. Since my knowledge of the PG source code is limited, I'm looking for assistance from experts to hep identify some new interesting probe points. How to use probes: For DTrace, probes can be enabled using a D script. When the probes are not enabled, there is absolutely no performance hit whatsoever. Here is a simple example to print out the number of LWLock counts for each PG process. test.d #!/usr/sbin/dtrace -s pg_backend*:::lwlock-acquire { @foo[pid] = count(); } dtrace:::END { printf("\n%10s %15s\n", "PID", "Count"); printa("%10d [EMAIL PROTECTED]",@foo); } # ./test.d PID Count 1438 28 1447 7240 1448 9675 1449 11972 I have a prototype working, so if anyone wants to try it, I can provide a patch or give access to my test system. This is a proposal, so comments,
Re: [HACKERS] [PERFORM] Sun Donated a Sun Fire T2000 to the PostgreSQL community
Arjen van der Meijden wrote: I can already confirm very good scalability (with our workload) on postgresql on that machine. We've been testing a 32thread/16G-version and it shows near-linear scaling when enabling 1, 2, 4, 6 and 8 cores (with all four threads enabled). The threads are a bit less scalable, but still pretty good. Enabling 1, 2 or 4 threads for each core yields resp 60 and 130% extra performance. Wow, what type of workload is it? And did you do much tuning to get near-linear scalability to 32 threads? Regards, -Robert ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
[HACKERS] Sun Donated a Sun Fire T2000 to the PostgreSQL community
I am thrill to inform you all that Sun has just donated a fully loaded T2000 system to the PostgreSQL community, and it's being setup by Corey Shields at OSL (osuosl.org) and should be online probably early next week. The system has * 8 cores, 4 hw threads/core @ 1.2 GHz. Solaris sees the system as having 32 virtual CPUs, and each can be enabled or disabled individually * 32 GB of DDR2 SDRAM memory * 2 @ 73GB internal SAS drives (1 RPM) * 4 Gigabit ethernet ports For complete spec, visit http://www.sun.com/servers/coolthreads/t2000/specifications.jsp I think this system is well suited for PG scalability testing, among others. We did an informal test using an internal OLTP benchmark and noticed that PG can scale to around 8 CPUs. Would be really cool if all 32 virtual CPUs can be utilized!!! Anyways, if you need to access the system for testing purposes, please contact Josh Berkus. Regards, Robert Lor Sun Microsystems, Inc. 01-510-574-7189 ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] PostgreSQL Solaris packages now in beta
Bruce, The binary was compiled in 32bit mode using Sun Studio compiler, and we plan to do 64bit soon. Can you point me to the patch? We can certainly test it! Regards, Robert Bruce Momjian wrote: Do they work on x86-64 processors using the Solaris compiler? We have a patch for it in CVS but it is not in 8.1.X, and we could use someone to eyeball and test it. --- ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
[HACKERS] PostgreSQL Solaris packages now in beta
With big thanks to Josh Berkus and Devrim Gunduz, I'm happy to announce that Sun has just released a Solaris distribution of PostgreSQL 8.1.2 with ready-to-install packages for both Sparc and x86. These packages are currently in Beta, and we expect to FCS in 2 -3 weeks. The packages, along with an install guide, are available for download at http://pgfoundry.org/projects/solarispackages/ We have tightly integrated PostgreSQL with Solaris in a manner similar to the Linux distributions available on postgresql.org. In fact, the directory structures are identical. Starting with Solaris 10 Update 2, PostgreSQL will be distributed with every copy of Solaris, via download and physical media. We welcome any and all feedback on this PostgreSQL Solaris distribution. Please subscribe to the [EMAIL PROTECTED] mailing list to give us feedback: http://pgfoundry.org/mail/?group_id=163 BTW, I'm a senior engineer at Sun Microsystems, recently working with the PostgreSQL community (namely Josh Berkus, Devrim Gunduz, and Gavin Sherry) on the Solaris Packages Project at PgFoundry, PostgreSQL performance optimization on Solaris, and leveraging Solaris 10 capabilities (e.g. DTrace) specifically for PostgreSQL. I'll be posting a Solaris performance tuning guide in a few weeks. Regards, Robert Lor ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster