Re: [HACKERS] log_duration is redundant, no?
On 9/16/06, Alvaro Herrera [EMAIL PROTECTED] wrote: It might make sense to log _what_ is going on, without telling all the little details, for example LOG: parse duration: 0.250 ms LOG: bind duration: 0.057 ms LOG: execute my_query: SELECT * FROM shop WHERE $1 = $2 DETAIL: parameters: $1 = 'Clothes Clothes Clothes', $2 = 'Joe''s Widgets' It's not really the idea when you use log_statement and log_duration. Lines are completely different semantically speaking. So you should have: LOG: parse (log_statement) LOG: duration: 0.250 ms (log_duration) -- Guillaume ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] log_duration is redundant, no?
On 9/16/06, Tom Lane [EMAIL PROTECTED] wrote: The only asymmetry in the thing is that if log_statement fired then we suppress duplicate printing of the query in the later duration log message (if any) for that query. But that seems like the right thing if you're at all concerned about log volume. Perhaps I'm not representative of the users of these settings but when I used log_statement='all', I didn't really care about the log volume. I knew it really generates a lot of log lines and it slows down my database. My only concern was that we now have less information with log_statement='all' than with log_min_duration_statement. That said, I don't use it myself now: I use exclusively log_min_duration_statement and log_duration. So if you think it's better like that, it's ok for me. Does anyone else have an opinion about this? -- Guillaume ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] log_duration is redundant, no?
Guillaume Smet [EMAIL PROTECTED] writes: My only concern was that we now have less information with log_statement='all' than with log_min_duration_statement. Well, you don't have the durations, but log_statement isn't supposed to tell you that. So I'm still quite confused about what it is that you want to do differently. regards, tom lane ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] log_duration is redundant, no?
On 9/8/06, Tom Lane [EMAIL PROTECTED] wrote: It's done already ... (Working on implementing the last changes you made in formatting in pgFouine) Is it normal that when I set log_duration to on and log_statement to all, I have the following output when I prepare/bind/execute a prepared statement using the protocol: LOG: duration: 0.250 ms LOG: duration: 0.057 ms LOG: execute my_query: SELECT * FROM shop WHERE $1 = $2 DETAIL: parameters: $1 = 'Clothes Clothes Clothes', $2 = 'Joe''s Widgets' I suppose the first line is the prepare and the second line is the bind but I'm not sure it's the desired behaviour. Any comment? -- Guillaume ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] log_duration is redundant, no?
Guillaume Smet [EMAIL PROTECTED] writes: Is it normal that when I set log_duration to on and log_statement to all, I have the following output when I prepare/bind/execute a prepared statement using the protocol: LOG: duration: 0.250 ms LOG: duration: 0.057 ms LOG: execute my_query: SELECT * FROM shop WHERE $1 = $2 DETAIL: parameters: $1 = 'Clothes Clothes Clothes', $2 = 'Joe''s Widgets' I suppose the first line is the prepare and the second line is the bind but I'm not sure it's the desired behaviour. Well, considering that the parse and bind may take longer than the execute, I hardly think we want to ignore them for log_duration purposes. And we agreed that if log_duration is on and log_min_duration_statement is not triggered, log_duration should print *only* duration. So I'm not sure what else you expected. regards, tom lane ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] log_duration is redundant, no?
On 9/16/06, Tom Lane [EMAIL PROTECTED] wrote: Well, considering that the parse and bind may take longer than the execute, I hardly think we want to ignore them for log_duration purposes. And we agreed that if log_duration is on and log_min_duration_statement is not triggered, log_duration should print *only* duration. So I'm not sure what else you expected. I don't know exactly what I expected. I'm just surprised to have only the duration when log_statement is set to all. If we consider that the prepare and the bind operations are important (and I agree they can be), I wonder why do we remove the output we have when log_min_duration_statement is set to 0 (I'm thinking of the parse: and bind: lines)? (sorry for the double post, I forgot to cc: the list) ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] log_duration is redundant, no?
Guillaume Smet [EMAIL PROTECTED] writes: If we consider that the prepare and the bind operations are important (and I agree they can be), I wonder why do we remove the output we have when log_min_duration_statement is set to 0 (I'm thinking of the parse: and bind: lines)? Well, we remove it for the execute: too if you have only log_duration on. My view of this is that log_duration is meant to find out the total amount of time spent doing stuff, and you set log_min_duration_statement to whatever your threshold of pain is for the amount of time spent doing a single thing. If it's less than log_min_duration_statement then you're saying you don't care about the details of that individual step, only the aggregate runtime. log_statement has another goal entirely, which is to record *what* is being done in a semantic sense, and so for that I think it makes sense to log only actual executions (and not parse/bind leading up to 'em). The only asymmetry in the thing is that if log_statement fired then we suppress duplicate printing of the query in the later duration log message (if any) for that query. But that seems like the right thing if you're at all concerned about log volume. regards, tom lane ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] log_duration is redundant, no?
Tom Lane wrote: Guillaume Smet [EMAIL PROTECTED] writes: If we consider that the prepare and the bind operations are important (and I agree they can be), I wonder why do we remove the output we have when log_min_duration_statement is set to 0 (I'm thinking of the parse: and bind: lines)? Well, we remove it for the execute: too if you have only log_duration on. My view of this is that log_duration is meant to find out the total amount of time spent doing stuff, and you set log_min_duration_statement to whatever your threshold of pain is for the amount of time spent doing a single thing. If it's less than log_min_duration_statement then you're saying you don't care about the details of that individual step, only the aggregate runtime. It might make sense to log _what_ is going on, without telling all the little details, for example LOG: parse duration: 0.250 ms LOG: bind duration: 0.057 ms LOG: execute my_query: SELECT * FROM shop WHERE $1 = $2 DETAIL: parameters: $1 = 'Clothes Clothes Clothes', $2 = 'Joe''s Widgets' -- Alvaro Herrerahttp://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] log_duration is redundant, no?
Alvaro Herrera wrote: Bruce Momjian wrote: #2, I think, but I am confused if you don't know the query, how valuable is the log_duration. Statistics? I doubt that there is a statistical merit to calculating aggregate values over the duration of an anonymous set of queries. -- Peter Eisentraut http://developer.postgresql.org/~petere/ ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] log_duration is redundant, no?
On Sat, Sep 09, 2006 at 06:33:10PM +0200, Peter Eisentraut wrote: Alvaro Herrera wrote: Bruce Momjian wrote: #2, I think, but I am confused if you don't know the query, how valuable is the log_duration. Statistics? I doubt that there is a statistical merit to calculating aggregate values over the duration of an anonymous set of queries. How heavily loaded is the server is a perfectly legitimate metric to have available, especially when more detailed, i.e. more invasive probes could bring it down. Cheers, D -- David Fetter [EMAIL PROTECTED] http://fetter.org/ phone: +1 415 235 3778AIM: dfetter666 Skype: davidfetter Remember to vote! ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] log_duration is redundant, no?
On Sat, Sep 09, 2006 at 06:33:10PM +0200, Peter Eisentraut wrote: Alvaro Herrera wrote: Bruce Momjian wrote: #2, I think, but I am confused if you don't know the query, how valuable is the log_duration. Statistics? I doubt that there is a statistical merit to calculating aggregate values over the duration of an anonymous set of queries. Eh? Sure there is, if you want totals per database, number of queries and where they were spread over the day. A perfectly reasonable example was given in this thread. If you're a service provider you don't care about what queries the users are doing, just how many resources they're taking. Have a nice day, -- Martijn van Oosterhout kleptog@svana.org http://svana.org/kleptog/ From each according to his ability. To each according to his ability to litigate. signature.asc Description: Digital signature
Re: [HACKERS] log_duration is redundant, no?
On 9/8/06, Bruce Momjian [EMAIL PROTECTED] wrote: Alvaro Herrera wrote: Statistics? Oh, interesting. We build this type of report for our customers: http://pgfouine.projects.postgresql.org/reports/sample_hourly.html This one is a real one. As you can see, we cannot tell the type of every query as we don't log the text of all of them (we identified 138,788 of the 12,358,514 queries executed this day). It helped to detect there was a weird behaviour at 11am which was not a good idea as we have a lot of visits at 11am. We asked our customers if there was any cronjob running at 11am we could move to another time when the load was less intensive as it really slowed down the database. Sometimes we also detect that we suddenly have a big up in the number of queries executed (and it can be small queries we cannot detect with log_min_duration_statement = 500) without the same increase on the HTTP side. We contact our customer to ask him what happens and if it's normal. If it's not, they take a look at the changes they made to the code the previous day. Sometimes, we can even give them a hint because it's at a specific time and we can suspect it's a cronjob running. If they find the problem, it's directly corrected and we can check the next day if it's okay. If they don't find the problem, then we enable full query logging for a short time and we analyze them to find the offending queries. And sometimes, in the case we just host the database without developing the application, we just simply need it for contractual reasons to demonstrate why the server is suddenly becoming slow. I hope it helps to clarify how we use it. -- Guillaume ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] log_duration is redundant, no?
Tom, On 9/8/06, Tom Lane [EMAIL PROTECTED] wrote: It seems like we should either remove the separate log_duration boolean or make it work as he suggests. I'm leaning to the second answer now. Do you want me to propose a patch or do you prefer to work on it yourself? If so, do we keep the log_duration name or do we change it to log_all_duration or another more appropriate name? I attached the little patch I use to apply on our packages. I can work on it to make it apply to HEAD and update the documentation. I suppose we should also change the FE/BE protocol logging accordingly but ISTM you already planned to change it for other reasons. -- Guillaume Index: src/backend/tcop/postgres.c === RCS file: /projects/cvsroot/pgsql/src/backend/tcop/postgres.c,v retrieving revision 1.483 diff -u -r1.483 postgres.c --- src/backend/tcop/postgres.c 4 Apr 2006 19:35:35 - 1.483 +++ src/backend/tcop/postgres.c 6 Apr 2006 12:12:00 - @@ -1092,14 +1092,6 @@ usecs = (long) (stop_t.tv_sec - start_t.tv_sec) * 100 + (long) (stop_t.tv_usec - start_t.tv_usec); - /* Only print duration if we previously printed the statement. */ - if (was_logged save_log_duration) - ereport(LOG, - (errmsg(duration: %ld.%03ld ms, - (long) ((stop_t.tv_sec - start_t.tv_sec) * 1000 + - (stop_t.tv_usec - start_t.tv_usec) / 1000), - (long) (stop_t.tv_usec - start_t.tv_usec) % 1000))); - /* * Output a duration_statement to the log if the query has exceeded * the min duration, or if we are to print all durations. @@ -1107,6 +1099,7 @@ if (save_log_min_duration_statement == 0 || (save_log_min_duration_statement 0 usecs = save_log_min_duration_statement * 1000)) + { ereport(LOG, (errmsg(duration: %ld.%03ld ms statement: %s%s, (long) ((stop_t.tv_sec - start_t.tv_sec) * 1000 + @@ -1114,6 +1107,17 @@ (long) (stop_t.tv_usec - start_t.tv_usec) % 1000, query_string, prepare_string ? prepare_string : ))); + } + else + { + /* Print duration if we did not print it before. */ + if (save_log_duration) +ereport(LOG, + (errmsg(duration: %ld.%03ld ms, +(long) ((stop_t.tv_sec - start_t.tv_sec) * 1000 + + (stop_t.tv_usec - start_t.tv_usec) / 1000), + (long) (stop_t.tv_usec - start_t.tv_usec) % 1000))); + } } if (save_log_statement_stats) ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] log_duration is redundant, no?
Guillaume Smet [EMAIL PROTECTED] writes: Do you want me to propose a patch or do you prefer to work on it yourself? It's done already ... regards, tom lane ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
[HACKERS] log_duration is redundant, no?
AFAICS, there is absolutely no difference anymore between turning log_duration ON and setting log_min_duration_statement to zero. ISTM that having the two redundant GUC settings is just confusing, and we should remove log_duration to simplify things. regards, tom lane ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] log_duration is redundant, no?
Tom, On 9/7/06, Tom Lane [EMAIL PROTECTED] wrote: AFAICS, there is absolutely no difference anymore between turning log_duration ON and setting log_min_duration_statement to zero. ISTM that having the two redundant GUC settings is just confusing, and we should remove log_duration to simplify things. log_duration can be useful if we restore a behaviour similar to what it did in 7.4 ie when you can log duration only for every query. This way you can have a global overview of your database activity. I mean: log_duration = on log_min_duration_statement = 500 would log only duration for queries faster than 500 ms and duration + query text for queries slower than 500ms (we can easily avoid redundancy). This allows you to have all query statistics without too much I/O. It doesn't give you all the information as you don't have the query text but it can give you useful statistics (to explain why the database server is suddenly slower - we executed 2 millions queries last month - now we execute 8 millions queries. We cannot easily have this sort of information as a log_min_duration_statement = 500 won't give it and a log_min_duration_statement = 0 generates far too I/O). For another real life example, we are switching from a manual vacuum to autovacuum for one of our customers and we want to know the consequences. The query text is useless for a first analysis (and will slow down the database too much) as we just want to detect if it's slower or not. Could we consider reintroduce the old behaviour rather than removing this setting (we can rename it to a better name if needed)? I already have a patch for that as we run a patched version of 8.1.4 here to have this very behaviour. -- Guillaume ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] log_duration is redundant, no?
Guillaume Smet [EMAIL PROTECTED] writes: I mean: log_duration = on log_min_duration_statement = 500 would log only duration for queries faster than 500 ms and duration + query text for queries slower than 500ms (we can easily avoid redundancy). I don't find this very persuasive --- it sounds awfully messy, and in fact isn't that exactly the old behavior we got rid of because no one could understand it? regards, tom lane ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] log_duration is redundant, no?
On Thu, Sep 07, 2006 at 06:06:51PM -0400, Tom Lane wrote: Guillaume Smet [EMAIL PROTECTED] writes: I mean: log_duration = on log_min_duration_statement = 500 would log only duration for queries faster than 500 ms and duration + query text for queries slower than 500ms (we can easily avoid redundancy). I don't find this very persuasive --- it sounds awfully messy, and in fact isn't that exactly the old behavior we got rid of because no one could understand it? Guillaume's the author of pgfouine, which understands it and helps others to do same. Cheers, D -- David Fetter [EMAIL PROTECTED] http://fetter.org/ phone: +1 415 235 3778AIM: dfetter666 Skype: davidfetter Remember to vote! ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] log_duration is redundant, no?
Tom, I don't find this very persuasive --- it sounds awfully messy, and in fact isn't that exactly the old behavior we got rid of because no one could understand it? Well, we want analogous functionality. We could stand to have it named/organized differently. But maybe we should hold those chages for 8.3, so that they can be tested properly? I am finding that the log format prior to Bruce's change, which we were using for TPCE, makes it very hard to do log digest analysis if you use SPs or prepared queries at all. -- --Josh Josh Berkus PostgreSQL @ Sun San Francisco ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] log_duration is redundant, no?
On 9/8/06, Tom Lane [EMAIL PROTECTED] wrote: I don't find this very persuasive --- it sounds awfully messy, and in fact isn't that exactly the old behavior we got rid of because no one could understand it? I gave real use cases and we use it every day. It really helps us as a PostgreSQL hosting company. The fact is that no tool could really exploit this behaviour before. I agree it's a totally useless information if you don't have a tool to analyze the logs. This is no longer the case as pgFouine can extract this information and make it useful by aggregating it. Perhaps we could rename it to log_all_duration (my english is not that good so I'm not sure it's a good name) and explain how it can be useful in the documentation. -- Guillaume ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] log_duration is redundant, no?
Guillaume Smet wrote: On 9/8/06, Tom Lane [EMAIL PROTECTED] wrote: I don't find this very persuasive --- it sounds awfully messy, and in fact isn't that exactly the old behavior we got rid of because no one could understand it? I gave real use cases and we use it every day. It really helps us as a PostgreSQL hosting company. The fact is that no tool could really exploit this behaviour before. I agree it's a totally useless information if you don't have a tool to analyze the logs. This is no longer the case as pgFouine can extract this information and make it useful by aggregating it. Perhaps we could rename it to log_all_duration (my english is not that good so I'm not sure it's a good name) and explain how it can be useful in the documentation. If you are using an external tool, can't you just restrict what you display based on the logged duration? -- Bruce Momjian [EMAIL PROTECTED] EnterpriseDBhttp://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. + ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] log_duration is redundant, no?
Bruce Momjian [EMAIL PROTECTED] writes: If you are using an external tool, can't you just restrict what you display based on the logged duration? I think his basic complaint is that doing the full logging pushup for even short-duration queries is too expensive, and that logging only the duration and not the query text or parameters makes a significant speed difference. I'm not at all sure that I buy that, but if it's true then subsequent filtering obviously doesn't help. regards, tom lane ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] log_duration is redundant, no?
On 9/8/06, Bruce Momjian [EMAIL PROTECTED] wrote: If you are using an external tool, can't you just restrict what you display based on the logged duration? It's not a matter of having too much information in our reports (the more information I have, the happier I am :)). It's a matter of slowing down too much the server with too much I/O. We can afford to log every duration and queries slower than 500ms nearly without any overhead. We can't afford to log every query, it generates too much I/O - note that we tried to do it and it was really too slow. With the former configuration we log 1.2 GB/day, with the latter I suspect it will be far more than 60 GB/day (I don't have the exact number as we can't do it for real but queries slower than 500 ms represents 1/100 of the total amount of queries). Query logging is really a nice way to monitor the activity of a PostgreSQL server and the overhead is not that high if logging I/O are not too intensive. -- Guillaume ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] log_duration is redundant, no?
Tom Lane wrote: Bruce Momjian [EMAIL PROTECTED] writes: If you are using an external tool, can't you just restrict what you display based on the logged duration? I think his basic complaint is that doing the full logging pushup for even short-duration queries is too expensive, and that logging only the duration and not the query text or parameters makes a significant speed difference. I'm not at all sure that I buy that, but if it's true then subsequent filtering obviously doesn't help. Well, except for bind, all the log output display is zero cost, just a printf(), as I remember. The only cost that is significant, I think, is the timing of the query, and that is happening for all the setttings discussed. -- Bruce Momjian [EMAIL PROTECTED] EnterpriseDBhttp://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. + ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] log_duration is redundant, no?
Bruce Momjian [EMAIL PROTECTED] writes: Well, except for bind, all the log output display is zero cost, just a printf(), as I remember. The only cost that is significant, I think, is the timing of the query, and that is happening for all the setttings discussed. On a machine with slow gettimeofday(), measuring duration at all is going to hurt, but apparently that is not Guillaume's situation --- what's costing him is sheer log volume. And remember that the slow-gettimeofday problem exists mainly on cheap PCs, not server-grade hardware. Based on his experience I'm prepared to believe that there is a use-case for logging just the duration for short queries. It seems like we should either remove the separate log_duration boolean or make it work as he suggests. I'm leaning to the second answer now. What's your vote? regards, tom lane ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] log_duration is redundant, no?
On 9/8/06, Tom Lane [EMAIL PROTECTED] wrote: I think his basic complaint is that doing the full logging pushup for even short-duration queries is too expensive, and that logging only the duration and not the query text or parameters makes a significant speed difference. I'm not at all sure that I buy that, but if it's true then subsequent filtering obviously doesn't help. That's exactly my point. And on our highly loaded servers, the log_duration behaviour makes the difference between: - we have a clear overview of the server activity and - we don't have any idea of what happens on this server (apart that there are queries slower than X ms). Regards, -- Guillaume ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] log_duration is redundant, no?
Tom Lane wrote: Bruce Momjian [EMAIL PROTECTED] writes: Well, except for bind, all the log output display is zero cost, just a printf(), as I remember. The only cost that is significant, I think, is the timing of the query, and that is happening for all the setttings discussed. On a machine with slow gettimeofday(), measuring duration at all is going to hurt, but apparently that is not Guillaume's situation --- what's costing him is sheer log volume. And remember that the slow-gettimeofday problem exists mainly on cheap PCs, not server-grade hardware. Based on his experience I'm prepared to believe that there is a use-case for logging just the duration for short queries. It seems like we should either remove the separate log_duration boolean or make it work as he suggests. I'm leaning to the second answer now. What's your vote? #2, I think, but I am confused if you don't know the query, how valuable is the log_duration. -- Bruce Momjian [EMAIL PROTECTED] EnterpriseDBhttp://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. + ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] log_duration is redundant, no?
Bruce Momjian wrote: #2, I think, but I am confused if you don't know the query, how valuable is the log_duration. Statistics? -- Alvaro Herrerahttp://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] log_duration is redundant, no?
Alvaro Herrera wrote: Bruce Momjian wrote: #2, I think, but I am confused if you don't know the query, how valuable is the log_duration. Statistics? Oh, interesting. -- Bruce Momjian [EMAIL PROTECTED] EnterpriseDBhttp://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. + ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] log_duration and log_statement
On Mon, 2006-03-13 at 23:40 +0100, Guillaume Smet wrote: Here are some background information to explain our issue and request. We are currently planning a migration from PostgreSQL 7.4 to PostgreSQL 8.1. We work on a medium sized database (2GB) with a rather important activity (12 millions queries a day with peaks up to 1000 queries/s). We are analyzing the logs with a tool we developed (namely pgFouine available on pgFoundry). Guillaume, Thanks very much for writing pgFouine. We've been doing our best to support generation of useful logs for performance analysis, so please feel free to ask for anything you see a need for. We currently use the following configuration for logging: - log_min_duration_statement = 500 to log the slowest queries - log_duration to log every query duration and have a global overview of our database activity (used to generate this sort of graphs: http://people.openwide.fr/~gsmet/postgresql/graphs.html ). Interesting results and good graphics too. I note your graphs don't show missing values: on the bottom graph there is no data for 7pm and 2am, yet the graph passes directly from 6 to 8pm as if the figure for 7pm was mid-way between them, rather than zero. We cannot log every query as we already generate 1.2GB of logs a day while only logging the text of one hundredth of the queries so we log only the duration for the not so slow queries. How do you tell the difference between a SELECT and a Write query when you do not see the text of the query? I didn't notice the log_duration behaviour has changed starting from 8.0 (thanks to oicu for pointing me the 8.0 release notes on #postgresql) and what we did is not possible anymore with 8.x as log_duration now only logs the duration for queries logged with log_statement. I think the former behaviour can be interesting in our case and probably for many other people out there who use log analysis tools as logging only slow queries is not enough to have an overview of the database activity. I was thinking about something like log_duration = 'none|logged|all' which will allow us to switch between: - none: we don't log the duration (=log_duration=off); - logged: we log the duration only for logged queries (depending on log_statement as for 8.0); - all: we log every duration as 7.4 did before when log_duration was on. Any comment on this? I think I need more persuasion before I see the value of this, but I'm not going to immediately disregard this either. Collecting information is interesting, but it must in some way lead to a rational corrective action. Logging the duration of statements without recording what they are would tell you there was a problem but make it difficult to move towards an action rationally. Perhaps I'm misunderstanding this. Is the issue that the log volume is too high? We might be able to look at ways to reduce/compress the log volume for duration/statement logging. Another thought might be to provide a random sample of queries. A 10% sample would be sufficient for your case here and yet would provide a possibility of deeper analysis also. Best Regards, Simon Riggs ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] log_duration and log_statement
Guillaume Smet [EMAIL PROTECTED] wrote Here are some background information to explain our issue and request. On a separate issue, seems in pgfouine homepage a typo is there for a while: What's New 2005-02-11 - pgFouine 0.4.99 released ... 2005-01-10 - pgFouine 0.2.1 released ... I guess they should read as 2006 :-) Regards, Qingqing ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] log_duration and log_statement
On 3/14/06, Qingqing Zhou [EMAIL PROTECTED] wrote: I guess they should read as 2006 :-) Sure. Will fix it this evening. Thanks. ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] log_duration and log_statement
Simon, On 3/14/06, Simon Riggs [EMAIL PROTECTED] wrote: Thanks very much for writing pgFouine. We've been doing our best to support generation of useful logs for performance analysis, so please feel free to ask for anything you see a need for. OK. Thanks. I should write something clear about what we need one day or another. The two problems I have in mind currently are: - problem to isolate a transaction as a whole. I mean I'd like to have something to aggregate the BEGIN; QUERY1; QUERY2; COMMIT; as a whole ; - problem with the way queries are logged when using PG 8.1 and the 8.1 jdbc driver. All our queries are logged as PREPARE and EXECUTE and we don't have the variables passed in the log (so no value for $1, $2... and we have the same problem to aggregate the real time of a query) but I saw a few threads on -hackers talking about this problem. I note your graphs don't show missing values: on the bottom graph there is no data for 7pm and 2am, yet the graph passes directly from 6 to 8pm as if the figure for 7pm was mid-way between them, rather than zero. Which is obviouly wrong. I missed it as it's not a common case on our db. Thanks for the report. How do you tell the difference between a SELECT and a Write query when you do not see the text of the query? A SELECT is a query beginning with SELECT. It's far from being perfect but it works most of the time for our projects. Our main problem is for queries like SELECT update_tables(); which are inserts/delete in a stored procedure. If you know any way to do it better, I'm open to any suggestion. I don't know if it's possible for the backend to log this sort of information. I think I need more persuasion before I see the value of this, but I'm not going to immediately disregard this either. Collecting information is interesting, but it must in some way lead to a rational corrective action. Logging the duration of statements without recording what they are would tell you there was a problem but make it difficult to move towards an action rationally. Perhaps I'm misunderstanding this. IMHO, the duration information are interesting as I can see when my database is heavily loaded which is not correlated with the http requests (two levels of cache) or with the server load on this particular website. Let's admit we have a slow down between 1am and 2pm. There can be a lot of reasons for that: - a lot of selects at this time because we are under heavy load from our website; - a lot of updates (cronjobs for example); - another unknow reason we should try to find. Having duration for all my queries can help me to understand the problem as I will know: - how many queries the database really executed at this particular time; - how slow they are. If I don't have a global slow down, perhaps, at this time I have a specific table locked for an unknown reason. I agree with you that it won't give me all the information to solve my problem but _it allows me to detect the problem_ and it can at least give me guidelines to how I can detect it more accurately. It also gives us a way to see the evolution from one day to another as we just operates the database, we don't develop the website. If I have a boost from 1 million queries a day to 20 millions queries a day, I won't know it with log_min_duration_statement and I can detect it if log_duration logs every query. Is the issue that the log volume is too high? We might be able to look at ways to reduce/compress the log volume for duration/statement logging. Well, I'm not sure we can reduce the size of a syslog log but perhaps I'm wrong. Our problem is both the volume and the performance drop introduced by logging. Another thought might be to provide a random sample of queries. A 10% sample would be sufficient for your case here and yet would provide a possibility of deeper analysis also. Why not but I won't have a global overview just a statistical information and I'd really like to have this overview without logging every query with log_statement (which we do sometimes for a few minutes when we really need more information about a specific problem). -- Guillaume ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] log_duration and log_statement
On Tue, Mar 14, 2006 at 08:50:22AM +, Simon Riggs wrote: Is the issue that the log volume is too high? We might be able to look at ways to reduce/compress the log volume for duration/statement logging. ISTM that for performance analysis there's probably a better alternative than just dumping query statements to a logfile. The information we really want is stuff like: WAL sync's per second Read-only queries per second Queries that modify data per second Transactions per second Blacks written per second (maybe broken down by WAL, heap and index) etc... Doesn't really have anything to do with logging query strings passed into the parser. I know dtrace could be used to provide this kind of info, but it's only on Solaris (and eventually FreeBSD). But it should be possible to provide our own version of that for platforms that don't have something like dtrace. Or perhaps there's some entirely different statistics collection method we could come up with... -- Jim C. Nasby, Sr. Engineering Consultant [EMAIL PROTECTED] Pervasive Software http://pervasive.comwork: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461 ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
[HACKERS] log_duration and log_statement
Hello, Here are some background information to explain our issue and request. We are currently planning a migration from PostgreSQL 7.4 to PostgreSQL 8.1. We work on a medium sized database (2GB) with a rather important activity (12 millions queries a day with peaks up to 1000 queries/s). We are analyzing the logs with a tool we developed (namely pgFouine available on pgFoundry). We currently use the following configuration for logging: - log_min_duration_statement = 500 to log the slowest queries - log_duration to log every query duration and have a global overview of our database activity (used to generate this sort of graphs: http://people.openwide.fr/~gsmet/postgresql/graphs.html ). We cannot log every query as we already generate 1.2GB of logs a day while only logging the text of one hundredth of the queries so we log only the duration for the not so slow queries. I didn't notice the log_duration behaviour has changed starting from 8.0 (thanks to oicu for pointing me the 8.0 release notes on #postgresql) and what we did is not possible anymore with 8.x as log_duration now only logs the duration for queries logged with log_statement. I think the former behaviour can be interesting in our case and probably for many other people out there who use log analysis tools as logging only slow queries is not enough to have an overview of the database activity. I was thinking about something like log_duration = 'none|logged|all' which will allow us to switch between: - none: we don't log the duration (=log_duration=off); - logged: we log the duration only for logged queries (depending on log_statement as for 8.0); - all: we log every duration as 7.4 did before when log_duration was on. Any comment on this? Thanks in advance for considering my request. Regards, -- Guillaume ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] log_duration
Added to TODO: * Add GUC log_statement_duration to print statement and = min duration --- Bruce Momjian wrote: Tom Lane wrote: Bruce Momjian [EMAIL PROTECTED] writes: One nice thing is that each element is orthoginal. But, for the functionality desired, we have to merge log_statement and log_duration and have it print for statements taking over X milliseconds. I have no problem adding it, but it has to be clear it isn't orthoginal but is a conditional combination of two other parameters. Actually, I was wondering if we shouldn't *replace* the current log_duration with a combined form (that specifies a minimum interesting duration). I can't quite see the need for orthogonality here. The only reason you'd care about query duration is that you're looking for the slow ones, no? So why bother logging the fast ones? Besides, you can specify min-duration zero if you really want 'em all. We did talk about this a while ago, and folks wanted the query printed _before_ it was executed, so they could see the query in the logs at the time it was issued, both for monitoring and for showing the time the query started when log_timestamp is enabled. Seems the clearest option would be for log_duration to print the query string too, and convert it to an integer field. I can see zero meaning print all queries and durations. What value do we use to turn it off? -1? This would give us log_statement that prints at query start, and log_duration that prints query and duration at query end. How is that? Maybe we should rename them as log_statement_start and log_statement_duration. -- Bruce Momjian| http://candle.pha.pa.us [EMAIL PROTECTED] | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup.| Newtown Square, Pennsylvania 19073 ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED] -- Bruce Momjian| http://candle.pha.pa.us [EMAIL PROTECTED] | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup.| Newtown Square, Pennsylvania 19073 ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [HACKERS] log_duration
Christopher Kings-Lynne wrote: Someone asked about this at FOSDEM. The only way I know to do it is look in the pgsql_temp directory, but they disappear pretty quickly. Folks, do we need something to report sort file usage? Fwiw here's the perl one-liner I used to tune sort_mem recently, (run in the $PGDATA/base directory): perl -e 'while (sleep(1)) {if ($s = -s pgsql_tmp/*) { if ($s $m) { $m = $s; print $s\n; } } }' When doing this I had precisely the same thought about having Postgres print out the disk space usage for sorts. -- greg ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [HACKERS] log_duration
Greg Stark wrote: Not a big deal though, since I doubt anyone's actually parsing postgres logs. Hm, brings up an interesting idea though, I wonder if it would be useful to log directly into postgres tables. I was wondering roughly the same thing. If you ran an external program to process the logs and put them into a PostgreSQL database, you'd have problems with the transactions of the log processor landing in the logs as well, at least if all transactions were logged. The logging process would have to filter out its own transactions, which might not be all that easy. -- Kevin Brown [EMAIL PROTECTED] ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] log_duration
Bruce Momjian [EMAIL PROTECTED] writes: One nice thing is that each element is orthoginal. But, for the functionality desired, we have to merge log_statement and log_duration and have it print for statements taking over X milliseconds. I have no problem adding it, but it has to be clear it isn't orthoginal but is a conditional combination of two other parameters. Actually, I was wondering if we shouldn't *replace* the current log_duration with a combined form (that specifies a minimum interesting duration). I can't quite see the need for orthogonality here. The only reason you'd care about query duration is that you're looking for the slow ones, no? So why bother logging the fast ones? Besides, you can specify min-duration zero if you really want 'em all. regards, tom lane ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [HACKERS] log_duration
One option is to have log_query output an identifier with the query such as a hash of the query or the pointer value for the plan, suppressing duplicates. Then log_duration prints the identifier with the duration. Actually, log_pid is the proper way to do this. You can then add log connections, and get a full snapshot of what is happening for that session. Personally I would prefer a unique identifier. I guess the best way of illustrating my intuition would be: Imagine loading all this data into a relational database, what would you need to full normalize it?. Parsing log files programmatically is much easier if you have unique identifiers instead of having to rely on the relative relationships of entries in the log. Not a big deal though, since I doubt anyone's actually parsing postgres logs. Hm, brings up an interesting idea though, I wonder if it would be useful to log directly into postgres tables. -- greg ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] log_duration
Greg Stark wrote: Tom Lane [EMAIL PROTECTED] writes: Christopher Kings-Lynne [EMAIL PROTECTED] writes: Looking at the log_duration postgresql.conf option. How about adding an option log_duration_min which is a value in milliseconds that is the minimum time a query must run for before being logged. Fine with me --- but you'll need to add more logic than that. Right now, log_duration *only* causes the query duration to be printed out; if you ain't got log_statement on, you're in the dark as to what the query itself was. You'll need to add some code to print the query (the log_min_error_statement logic might be a useful source of inspiration). Not sure how this should interact with the case where log_duration is set and the min-duration isn't. But maybe that case is silly, and we should just redefine log_duration as a minimum runtime that causes the query *and* its runtime to be printed to the log. Tom is right here. log_duration _just_ prints the duration, so we would need to basically create a merged param that does log_duration and log_statement and have it activate only if the statement takes more than X milliseconds, something like log_long_statement, or something like that. Here are the log_* params we have: log_connections = false log_hostname = false log_source_port = false log_pid = false log_statement = false log_duration = false log_timestamp = false Basically, log_pid pulls them all together. Without that, you don't have any way to pull together individual lines in the log, and with pid wraparound, you can't even do that 100%. I wonder if we should put a number before the pid and increment it on every pid wraparound. One nice thing is that each element is orthoginal. But, for the functionality desired, we have to merge log_statement and log_duration and have it print for statements taking over X milliseconds. I have no problem adding it, but it has to be clear it isn't orthoginal but is a conditional combination of two other parameters. Is it even guaranteed to be properly ordered on a busy server with multiple processors anyways? One option is to have log_query output an identifier with the query such as a hash of the query or the pointer value for the plan, suppressing duplicates. Then log_duration prints the identifier with the duration. This means on a busy server running lots of prepared queries you would see a whole bunch of queries on startup, then hopefully no durations. Any durations printed could cause alarms to go off. To find the query you grep the logs for the identifier in the duration message. Actually, log_pid is the proper way to do this. You can then add log connections, and get a full snapshot of what is happening for that session. This only really works if you're using prepared queries everywhere. But in the long run that will be the case for OLTP systems, which is where log_duration is really useful. -- Bruce Momjian| http://candle.pha.pa.us [EMAIL PROTECTED] | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup.| Newtown Square, Pennsylvania 19073 ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [HACKERS] log_duration
Tom is right here. log_duration _just_ prints the duration, so we would need to basically create a merged param that does log_duration and log_statement and have it activate only if the statement takes more than X milliseconds, something like log_long_statement, or something like that. Here are the log_* params we have: log_connections = false log_hostname = false log_source_port = false log_pid = false log_statement = false log_duration = false log_timestamp = false OK, while I'm doing all this benchmarking and stuff - is there any sort of option where I can see it logged when a sort doesn't have enought sort memory and hence hits the disk? eg. an elog(LOG) is emitted? Chris ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] log_duration
Christopher Kings-Lynne wrote: Tom is right here. log_duration _just_ prints the duration, so we would need to basically create a merged param that does log_duration and log_statement and have it activate only if the statement takes more than X milliseconds, something like log_long_statement, or something like that. Here are the log_* params we have: log_connections = false log_hostname = false log_source_port = false log_pid = false log_statement = false log_duration = false log_timestamp = false OK, while I'm doing all this benchmarking and stuff - is there any sort of option where I can see it logged when a sort doesn't have enought sort memory and hence hits the disk? eg. an elog(LOG) is emitted? Someone asked about this at FOSDEM. The only way I know to do it is look in the pgsql_temp directory, but they disappear pretty quickly. Folks, do we need something to report sort file usage? -- Bruce Momjian| http://candle.pha.pa.us [EMAIL PROTECTED] | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup.| Newtown Square, Pennsylvania 19073 ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] log_duration
Someone asked about this at FOSDEM. The only way I know to do it is look in the pgsql_temp directory, but they disappear pretty quickly. Folks, do we need something to report sort file usage? How about a new GUC variable: log_sort_tempfiles And in the code that creates the temp file, if the GUC variable is true, then do: elog(LOG, Sort needed temp file. Sort required 2456K. Try increasing sort_mem.); Or something? Chris ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [HACKERS] log_duration
Well, part of the issue here is that it isn't always bad to use sort file; certainly it is better to use them than to swap. We have a checkpoint_warning in 7.4 that will warn about excessive checkpointing. What would our criteria be for warning about sort_mem? Seems we would have to know how much free memory there is available, and in fact, if there is lots of free memory, the sort files will just sit in the kernel disk cache anyway. I am not saying this is a bad idea --- we just need to define it clearer. --- Christopher Kings-Lynne wrote: Someone asked about this at FOSDEM. The only way I know to do it is look in the pgsql_temp directory, but they disappear pretty quickly. Folks, do we need something to report sort file usage? How about a new GUC variable: log_sort_tempfiles And in the code that creates the temp file, if the GUC variable is true, then do: elog(LOG, Sort needed temp file. Sort required 2456K. Try increasing sort_mem.); Or something? Chris -- Bruce Momjian| http://candle.pha.pa.us [EMAIL PROTECTED] | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup.| Newtown Square, Pennsylvania 19073 ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [HACKERS] log_duration
Tom Lane wrote: Bruce Momjian [EMAIL PROTECTED] writes: One nice thing is that each element is orthoginal. But, for the functionality desired, we have to merge log_statement and log_duration and have it print for statements taking over X milliseconds. I have no problem adding it, but it has to be clear it isn't orthoginal but is a conditional combination of two other parameters. Actually, I was wondering if we shouldn't *replace* the current log_duration with a combined form (that specifies a minimum interesting duration). I can't quite see the need for orthogonality here. The only reason you'd care about query duration is that you're looking for the slow ones, no? So why bother logging the fast ones? Besides, you can specify min-duration zero if you really want 'em all. We did talk about this a while ago, and folks wanted the query printed _before_ it was executed, so they could see the query in the logs at the time it was issued, both for monitoring and for showing the time the query started when log_timestamp is enabled. Seems the clearest option would be for log_duration to print the query string too, and convert it to an integer field. I can see zero meaning print all queries and durations. What value do we use to turn it off? -1? This would give us log_statement that prints at query start, and log_duration that prints query and duration at query end. How is that? Maybe we should rename them as log_statement_start and log_statement_duration. -- Bruce Momjian| http://candle.pha.pa.us [EMAIL PROTECTED] | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup.| Newtown Square, Pennsylvania 19073 ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [HACKERS] log_duration
Tom Lane [EMAIL PROTECTED] writes: Christopher Kings-Lynne [EMAIL PROTECTED] writes: Looking at the log_duration postgresql.conf option. How about adding an option log_duration_min which is a value in milliseconds that is the minimum time a query must run for before being logged. Fine with me --- but you'll need to add more logic than that. Right now, log_duration *only* causes the query duration to be printed out; if you ain't got log_statement on, you're in the dark as to what the query itself was. You'll need to add some code to print the query (the log_min_error_statement logic might be a useful source of inspiration). Not sure how this should interact with the case where log_duration is set and the min-duration isn't. But maybe that case is silly, and we should just redefine log_duration as a minimum runtime that causes the query *and* its runtime to be printed to the log. Is it even guaranteed to be properly ordered on a busy server with multiple processors anyways? One option is to have log_query output an identifier with the query such as a hash of the query or the pointer value for the plan, suppressing duplicates. Then log_duration prints the identifier with the duration. This means on a busy server running lots of prepared queries you would see a whole bunch of queries on startup, then hopefully no durations. Any durations printed could cause alarms to go off. To find the query you grep the logs for the identifier in the duration message. This only really works if you're using prepared queries everywhere. But in the long run that will be the case for OLTP systems, which is where log_duration is really useful. -- greg ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
[HACKERS] log_duration
Hi guys, Looking at the log_duration postgresql.conf option. How about adding an option log_duration_min which is a value in milliseconds that is the minimum time a query must run for before being logged. Basically, what I'd be interested in is please log the SQL query and duration of all queries that last longer than 500ms. That way I can quickly find badly formed queries, index them, etc... Idea? Chris ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] log_duration
Christopher Kings-Lynne [EMAIL PROTECTED] writes: Looking at the log_duration postgresql.conf option. How about adding an option log_duration_min which is a value in milliseconds that is the minimum time a query must run for before being logged. Fine with me --- but you'll need to add more logic than that. Right now, log_duration *only* causes the query duration to be printed out; if you ain't got log_statement on, you're in the dark as to what the query itself was. You'll need to add some code to print the query (the log_min_error_statement logic might be a useful source of inspiration). Not sure how this should interact with the case where log_duration is set and the min-duration isn't. But maybe that case is silly, and we should just redefine log_duration as a minimum runtime that causes the query *and* its runtime to be printed to the log. regards, tom lane ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/users-lounge/docs/faq.html