Re: [HACKERS] contrib/pg_stat_statements 1226

2009-01-05 Thread Gregory Stark
Tom Lane  writes:

> 2. I'm unconvinced by the proposed changes to accumulate backend-local
> I/O counters, too.  The fact of the matter is that those counters are
> left over from Berkeley days, a time when PG hackers tended to do their
> performance measurements in standalone backends (!).  They're obviously
> not the full story now on write measurements, and I don't have any
> confidence in them as read measurements either, particularly seeing that
> the wave of the future is likely to be asynchronous read operations
> (with the posix_fadvise patch and foreseeable follow-on work).  I think
> those counters should more likely be done away with than
> institutionalized in EXPLAIN ANALYZE output.  You can get more reliable
> information about what's happening from the existing pgstats system-wide
> I/O counts.

It's clear that these counters are not the whole picture. But I think that
just means we need more counters with more sources of data, not that we need
to get rid of the ones we have. There's no reason we shouldn't have counters
for advised buffers along with reads.

I'm also picturing using dtrace to find out how many reads were satisfied from
cache and how many required physical reads for example.

The system-wide stats satisfy a very different need from this. The sysadmin or
DBA might want to know about system-wide stats but a programmer or a DBA
analyzing a specific query needs to know what that query is doing.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com
  Ask me about EnterpriseDB's RemoteDBA services!

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] contrib/pg_stat_statements 1226

2009-01-04 Thread ITAGAKI Takahiro
Thank you for many works.

Tom Lane  wrote:

> 1. The proposed change to track system/user CPU time presents an
> enormous cost, and no argument has been made to show that there is any
> comparable benefit.

> 2. I'm unconvinced by the proposed changes to accumulate backend-local
> I/O counters, too.  The fact of the matter is that those counters are
> left over from Berkeley days, a time when PG hackers tended to do their
> performance measurements in standalone backends (!).

Ok, I need to reconsider performance and design of I/O counters.
I think those information is still useful because we can determine
not only which query is bad, but also why the query is bad 
*without repeating the query in production servers*.
But the reworking would be done in 8.5...


> * I changed the default track setting to "top".  I don't see the
> likelihood that someone would load this module into their server
> and not want it turned on.

Looks good. Setting shared_preload_libraries is enough for normal use.

> * I'm not entirely seeing the point of a server-wide tracking facility
> that only counts SELECT/INSERT/UPDATE/DELETE.  ISTM this should be
> modified to count utility commands too; which probably means adding some
> hooks around ProcessUtility (and what about autovacuum?).  I left that
> work for someone else to do, though.

Sure, but we should also consider utility commands should not kick out
DML queries because maintenance commands  take long time typically.

> * As already mentioned I find the entry_dealloc logic pretty darn
> dubious; but I didn't touch that either in this go-round.  If we do
> keep it in its current form, ISTM that usage ought to be proportional
> to total execution time not total call count.

I think the current implementation is not ideal, too.
I also don't like using fixed-size shared memory. I'm thinking
that shared memory used by extended modules to be allocated
from the shared buffer pool in the future. If we could do it,
memory management will be flexible and we can load libraries
after server starts.

Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] contrib/pg_stat_statements 1226

2009-01-04 Thread Tom Lane
ITAGAKI Takahiro  writes:
> Here is an updated version of contrib/pg_stat_statements patch.

I've committed this with significant revisions.  Other than the points
already mentioned in previous messages:

* I removed the proposed changes to the behavior of the core EXPLAIN
code.  I think that that should be submitted and discussed as a separate
patch, not slide in under the misleading title of being a "contrib
module".  I'm personally against those changes anyway, on two grounds:

1. The proposed change to track system/user CPU time presents an
enormous cost, and no argument has been made to show that there is any
comparable benefit.  The change causes each EXPLAIN ANALYZE tracking
call to invoke getrusage() as well as gettimeofday().  I did a little
bit of testing and found that this is over seven times slower on Fedora
9 on x86_64 (Xeon hardware) and over twenty-seven times slower on Darwin
(on Core 2 Duo hardware).  Considering that EXPLAIN ANALYZE overhead is
already higher than anyone would like, you would need a pretty darn
convincing argument to persuade us to accept that kind of slowdown.
At the very least the code would need to be modified so that it doesn't
execute getrusage() unless the user is actually going to look at the
results.

2. I'm unconvinced by the proposed changes to accumulate backend-local
I/O counters, too.  The fact of the matter is that those counters are
left over from Berkeley days, a time when PG hackers tended to do their
performance measurements in standalone backends (!).  They're obviously
not the full story now on write measurements, and I don't have any
confidence in them as read measurements either, particularly seeing that
the wave of the future is likely to be asynchronous read operations
(with the posix_fadvise patch and foreseeable follow-on work).  I think
those counters should more likely be done away with than
institutionalized in EXPLAIN ANALYZE output.  You can get more reliable
information about what's happening from the existing pgstats system-wide
I/O counts.

* I changed the default track setting to "top".  I don't see the
likelihood that someone would load this module into their server
and not want it turned on.

* I'm not entirely seeing the point of a server-wide tracking facility
that only counts SELECT/INSERT/UPDATE/DELETE.  ISTM this should be
modified to count utility commands too; which probably means adding some
hooks around ProcessUtility (and what about autovacuum?).  I left that
work for someone else to do, though.

* As already mentioned I find the entry_dealloc logic pretty darn
dubious; but I didn't touch that either in this go-round.  If we do
keep it in its current form, ISTM that usage ought to be proportional
to total execution time not total call count.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] contrib/pg_stat_statements 1226

2009-01-04 Thread Tom Lane
I wrote:
> I'm not enamored of "saved_file" either, it seems like the wrong
> part of speech somehow.  Maybe "save_in_file"?

Actually ... what is the point of letting users control the filename at
all?  It seems like the only useful nondefault value is the empty string
(to suppress storing the stats).  So why not simplify this to a boolean?
"pg_stat_statements.save" = on or off.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] contrib/pg_stat_statements 1226

2009-01-03 Thread Tom Lane
ITAGAKI Takahiro  writes:
> "Alex Hunsaker"  wrote:
>> How about just pg_stat_statements.track ?

> I renamed the variables to:
> - pg_stat_statements.limit
> - pg_stat_statements.track
> - pg_stat_statements.saved_file

Hmm, this has got a problem:

regression=# show pg_stat_statements.limit ;
ERROR:  syntax error at or near "limit"
LINE 1: show pg_stat_statements.limit ;
^

LIMIT is a reserved word ...

I thought max_statements was fine, but if people think that's too long
I guess we could go with just "max".

I'm not enamored of "saved_file" either, it seems like the wrong
part of speech somehow.  Maybe "save_in_file"?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] contrib/pg_stat_statements 1226

2009-01-03 Thread Alex Hunsaker
On Fri, Jan 2, 2009 at 20:20, Tom Lane  wrote:
> I wrote:
>> * the startup/shutdown hooks will be installed in the postmaster
>> process, but the patch expects them to be executed in a child process.
>> I think nothing will happen.
>
> OK, I figured out what is happening there: the patch makes it work by
> means of this expedient:



> I find this mighty Rube Goldbergian.  We have a startup hook which is
> declared in include/storage/ipc.h, but defined and called in bootstrap.c
> (whence it will actually be executed down inside the startup process).
> We have a shutdown hook which is also declared in include/storage/ipc.h,
> but defined and called in bgwriter.c (executed in the bgwriter process,
> of course).  And to make those hooks work in the EXEC_BACKEND case, we
> have a kluge inserted in proc.c, miles away from where the existing
> process_shared_preload_libraries() calls are (in postmaster.c), and with
> extremely high probability of someday resulting in duplicate preload
> operations if the postmaster.c code gets shuffled.

Kudos to Itagaki-san for getting that to work?

> As for the shutdown hook, I don't think we need it at all in this
> design.  When loaded into the postmaster process, pg_stat_statements can
> insert itself into the on_proc_exit or on_shmem_exit hook lists ... it
> doesn't need a private hook.

Ok cool.

> The right way to make that happen is to rearrange the coding in
> SubPostmasterMain() so that process_shared_preload_libraries is
> done in all cases, just after the read_nondefault_variables call.

This should also fix the rmg hooks patch on EXEC_BACKEND.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] contrib/pg_stat_statements 1226

2009-01-02 Thread Tom Lane
I wrote:
> * the startup/shutdown hooks will be installed in the postmaster
> process, but the patch expects them to be executed in a child process.
> I think nothing will happen.

OK, I figured out what is happening there: the patch makes it work by
means of this expedient:

diff -cprN HEAD/src/backend/storage/lmgr/proc.c 
pg_stat_statements/src/backend/storage/lmgr/proc.c
*** HEAD/src/backend/storage/lmgr/proc.c2008-12-10 02:03:02.36659 
+0900
--- pg_stat_statements/src/backend/storage/lmgr/proc.c  2008-12-26 
14:51:58.06250 +0900
*** InitAuxiliaryProcess(void)
*** 381,386 
--- 381,390 
if (MyProc != NULL)
elog(ERROR, "you already exist");
  
+ #ifdef EXEC_BACKEND
+   process_shared_preload_libraries();
+ #endif
+ 
/*
 * We use the ProcStructLock to protect assignment and releasing of
 * AuxiliaryProcs entries.

I find this mighty Rube Goldbergian.  We have a startup hook which is
declared in include/storage/ipc.h, but defined and called in bootstrap.c
(whence it will actually be executed down inside the startup process).
We have a shutdown hook which is also declared in include/storage/ipc.h,
but defined and called in bgwriter.c (executed in the bgwriter process,
of course).  And to make those hooks work in the EXEC_BACKEND case, we
have a kluge inserted in proc.c, miles away from where the existing
process_shared_preload_libraries() calls are (in postmaster.c), and with
extremely high probability of someday resulting in duplicate preload
operations if the postmaster.c code gets shuffled.

I don't really care for the idea of initializing the pg_stat_statements
shared memory down inside the startup process.  All the rest of shared
memory is initialized by the postmaster process itself, and I think
pg_stat_statements should probably work the same way.  Normally I am
against having more functionality in the postmaster than absolutely
necessary, but I don't see any robustness loss in this situation: if we
have the pg_stat_statements init code in the startup subprocess, and it
dies, we'll abort startup anyway.  So we might as well run it directly
in the postmaster.  I think the right place is probably at the bottom of
CreateSharedMemoryAndSemaphores().  This will serve two purposes: to
create the pg_stat_statements shared memory when run in the postmaster,
or to re-attach to it when starting an EXEC_BACKEND child.  The existing
coding in the patch that will try to create or attach to the shared
memory at any old random instant ought to go away --- if one of these
functions is called and doesn't find the pgss shared memory pointer
ready-to-go, it should error out or return quietly as appropriate.
That would indicate that pg_stat_statements.so got loaded into an
already-running backend, which means that the shared memory situation
can't possibly be right.

As for the shutdown hook, I don't think we need it at all in this
design.  When loaded into the postmaster process, pg_stat_statements can
insert itself into the on_proc_exit or on_shmem_exit hook lists ... it
doesn't need a private hook.

BTW, as for the question of where to call
process_shared_preload_libraries: we currently have postmaster.c doing
this for itself, and when spawning a regular backend child in the
EXEC_BACKEND case.  We don't do it for other child process types;
which is the omission that Itagaki-san tried to work around with
the above diff hunk.  You could argue that this is a case of premature
optimization (a/k/a the root of all evil).  I think the idea was that
we'd not possibly need any loadable libraries installed in special child
processes --- but that seems more than a little bit dubious if you
think about a loadable library whose goal is to monitor system activity,
as opposed to implementing some SQL-callable functionality.  Moreover
it fails to duplicate what will happen in the non-EXEC_BACKEND case;
all child processes will inherit loadable libraries then.  So I'm
thinking that Itagaki-san had the right idea in wanting to make
auxiliary processes load libraries, just the wrong implementation.
The right way to make that happen is to rearrange the coding in
SubPostmasterMain() so that process_shared_preload_libraries is
done in all cases, just after the read_nondefault_variables call.

Comments?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] contrib/pg_stat_statements 1226

2009-01-01 Thread Alex Hunsaker
On Thu, Jan 1, 2009 at 19:59, Tom Lane  wrote:
> I wrote:
>> * in an EXEC_BACKEND situation, we re-execute
>> process_shared_preload_libraries() when starting a fresh backend
>> (but not in other kinds of child processes, which is why the other
>> problem is a problem).  This means re-executing the _PG_init function,
>> which will try to redefine the custom GUC variables, which will fail.
>> I don't think this is really a bug in this patch per se, it's a bug
>> in the custom-GUC support; but nonetheless it looks like a problem.
>
> Oh, never mind that part.  I was thinking that the child process would
> already know the real definition of the custom variable at the time it
> tries to load the shared library, but actually the mechanism for pushing
> GUC values into EXEC_BACKEND child processes doesn't transfer the whole
> variable definition.  It causes any such values to be loaded as
> placeholders, and then the later load of the shared library converts the
> placeholder to a regular variable.

> So it all works, or nearly anyway:
> the code fails on a custom variable class whose name alphabetically
> precedes "custom_variable_class".

Cool!  Err interesting...

> http://archives.postgresql.org/pgsql-committers/2009-01/msg8.php

Yeah I saw your commits just shortly after hitting send on my reply :)

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] contrib/pg_stat_statements 1226

2009-01-01 Thread Tom Lane
I wrote:
> * in an EXEC_BACKEND situation, we re-execute
> process_shared_preload_libraries() when starting a fresh backend
> (but not in other kinds of child processes, which is why the other
> problem is a problem).  This means re-executing the _PG_init function,
> which will try to redefine the custom GUC variables, which will fail.
> I don't think this is really a bug in this patch per se, it's a bug
> in the custom-GUC support; but nonetheless it looks like a problem.

Oh, never mind that part.  I was thinking that the child process would
already know the real definition of the custom variable at the time it
tries to load the shared library, but actually the mechanism for pushing
GUC values into EXEC_BACKEND child processes doesn't transfer the whole
variable definition.  It causes any such values to be loaded as
placeholders, and then the later load of the shared library converts the
placeholder to a regular variable.  So it all works, or nearly anyway:
the code fails on a custom variable class whose name alphabetically
precedes "custom_variable_class".

http://archives.postgresql.org/pgsql-committers/2009-01/msg8.php

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] contrib/pg_stat_statements 1226

2009-01-01 Thread Alex Hunsaker
On Thu, Jan 1, 2009 at 17:28, Tom Lane  wrote:
> "Alex Hunsaker"  writes:
>> ...  So Im going to mark it as
>> ready for commmiter.
>
> Has this patch been tested on Windows?  (Or more generally, with 
> EXEC_BACKEND?)

I was under the impression thats where Itagaki-san develops.You'll
note some other specific windows changes:

pgstat_track_activity_query_size gains PGDLLIMPORT
process_shared_preload_libraries()
  also seems of intreset:
http://archives.postgresql.org/pgsql-hackers/2008-12/msg01416.php

varoius carriage returns in the patch...

I could be wrong though.

> The reason I ask is that eyeballing the code suggests a couple of major
> problems in that area:
>
> * the startup/shutdown hooks will be installed in the postmaster
> process, but the patch expects them to be executed in a child process.
> I think nothing will happen.

I dunno about this one, not very familiar with EXEC_BACKEND

> * in an EXEC_BACKEND situation, we re-execute
> process_shared_preload_libraries() when starting a fresh backend
> (but not in other kinds of child processes, which is why the other
> problem is a problem).  This means re-executing the _PG_init function,
> which will try to redefine the custom GUC variables, which will fail.
> I don't think this is really a bug in this patch per se, it's a bug
> in the custom-GUC support; but nonetheless it looks like a problem.

I see 3 options:
- add a GUC_CUSTOM_REDEFINE flag

- ignore redefines of custom gucs

-change the define_custom_variable() to return a bool (or something)
 true if it got added
 false if it was already there

Seems to me we could probably just ignore any redefines of custom gucs
outright.  Im not to worried about some other module picking the same
custom guc.  And frankly the op should notice.  Especially when they
go to add it to custom_variable_classes.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] contrib/pg_stat_statements 1226

2009-01-01 Thread Tom Lane
"Alex Hunsaker"  writes:
> ...  So Im going to mark it as
> ready for commmiter.

Has this patch been tested on Windows?  (Or more generally, with EXEC_BACKEND?)

The reason I ask is that eyeballing the code suggests a couple of major
problems in that area:

* the startup/shutdown hooks will be installed in the postmaster
process, but the patch expects them to be executed in a child process.
I think nothing will happen.

* in an EXEC_BACKEND situation, we re-execute
process_shared_preload_libraries() when starting a fresh backend
(but not in other kinds of child processes, which is why the other
problem is a problem).  This means re-executing the _PG_init function,
which will try to redefine the custom GUC variables, which will fail.
I don't think this is really a bug in this patch per se, it's a bug
in the custom-GUC support; but nonetheless it looks like a problem.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] contrib/pg_stat_statements 1226

2008-12-30 Thread Alex Hunsaker
On Mon, Dec 29, 2008 at 15:46, Alex Hunsaker  wrote:
> On Thu, Dec 25, 2008 at 23:04, ITAGAKI Takahiro
>  wrote:
>> Here is an updated version of contrib/pg_stat_statements patch.
>
> Im going to do some more testing, give the typedef Chunk stuff another
> look (you did fix the race/not protected by a spinlock  you found
> earlier right?) .  And if all looks good mark it as ready for
> commiter.  (Assuming I find time tonight :))

Ok i ran a portion of tpc-h with 4 simultaneous clients doing the same
query on a 32bit dual core machine about a hundred times (lets call
this A).  I then compared the # of calls, # of gets and # rows from
pg_stat_statements view to a single run (lets cal this B) to make sure
they matched (i.e. made sure A.gets = A.calls * B.gets), that part all
looks good.  So I think if there was a race you squashed it.  My only
nit this time around was patch complaining about (Stripping trailing
CRs from patch.) but that's no big deal.  So Im going to mark it as
ready for commmiter.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] contrib/pg_stat_statements 1226

2008-12-29 Thread Alex Hunsaker
On Thu, Dec 25, 2008 at 23:04, ITAGAKI Takahiro
 wrote:
> Here is an updated version of contrib/pg_stat_statements patch.

Im going to do some more testing, give the typedef Chunk stuff another
look (you did fix the race/not protected by a spinlock  you found
earlier right?) .  And if all looks good mark it as ready for
commiter.  (Assuming I find time tonight :))

I think the other questions a -commiter needs to respond to, namely:
- explain_analyze_format guc, do we want it?

- queryDesc->sourceText changes
   (look good to me though, and I don't see any other obvious way to do it)

- rtime/utime to explain analyze printout

- moves statistic counters (ReadBufferCount etc...) into struct
Instrumentation via a new struct BufferCounter
(looks like a good cleanup regardless...)

> Should I also rename variables used in auto_explain module?
> It uses 'explain.*' now.

Well in the school of second thought you *do* have to manually define
them in custom_variable_classes, so maybe its fine.  Personally though
I would like them to be auto_explain.*... it seems harder to mistake
them later as having something to do EXPLAIN.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] contrib/pg_stat_statements 1226

2008-12-25 Thread ITAGAKI Takahiro
Here is an updated version of contrib/pg_stat_statements patch.

"Alex Hunsaker"  wrote:

> >> I think the explain_analyze_format guc is a clever way of getting
> >> around the explain analyze verbose you proposed earlier.  But I dont
> >> see any doc updates for it.

Documentation is added.

> How about just pg_stat_statements.track ?

I renamed the variables to:
- pg_stat_statements.limit
- pg_stat_statements.track
- pg_stat_statements.saved_file

I also modified assign_custom_variable_classes()
to accept '_' as a prefix character, not only 0-9A-Za-z.

> I do like the consistency of having the custom gucs be the same as the
> module name, easy to grep or google for.

Should I also rename variables used in auto_explain module?
It uses 'explain.*' now.

Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center



pg_stat_statements.1226.tar.gz
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers