Re: [GENERAL] suggestion: log_statement = sample

2009-07-21 Thread Janning Vygen
On Monday 20 July 2009 18:58:21 Greg Sabino Mullane wrote:
 Perhaps, but I don't think you've quite overcome the 'log everything'
 counter-argument. 
#
Not everybody can afford a system with lots of raid arrays or dedicated logging 
boxes. Many people log to the same disk. I do it in some projects. 

So log_sample gives you quite as much log analysing as full logging but 
without the I/O overhead. 

Maybe most of the postgresql guys have lots of hardware with RAID 10 SCSI 
Disks but reality sometimes differ. we run most of our boxes on IDE RAID 1 
without additional disks for logging. In this scenario you just can't log 
everything. log_min_duration gives you only a view to long running queries but 
that is just on half of the whole picture.

kind regards
Janning


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] suggestion: log_statement = sample

2009-07-21 Thread Janning Vygen
On Monday 20 July 2009 19:24:13 Bill Moran wrote:
   It is not possible for us. Logging millions of statements take too much
   time.

 This is a ridiculous statement.  In actual practice, full query logging
 is 1/50 the amount of disk I/O as the actual database activity.  If your
 systems are so stressed that they can't handle another 2% increase, then
 you've got bigger problems lurking.

 Have you benchmarked the load it creates under your workload?

Yes, it takes up to 15% of our workload in an average use case. But we have 
peak times where we can not afford 15% lost for logging! 

And if we log on the same disk, it slows down reading and writing. So it does 
not really matter how much work load it creates. If it slows down my response, 
i dont want to log everything. (I know that logging onto the same disk is 
bad!)

 Overall, it seems like you've decided that you want this feature and
 nothing else will do.  If that's the case, then just go ahead and write it.

It was really just a suggestion. I know that i can handle my problems in a 
different way like doing full logging from time to time, better automatic 
tests, more hardware and so on. I just thought, it couldnt be so hard to log 
every nth statement instead of all. And i think that there are lot of 
usescases out there for a directive like this. I really just wanted to help to 
make postgresql better. Even if it would be implemented soon, it wouldn't help 
me. As i need to analyze my load now! 

kind regards
Janning


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] suggestion: log_statement = sample

2009-07-21 Thread Tom Lane
Janning Vygen vy...@kicktipp.de writes:
 On Monday 20 July 2009 19:24:13 Bill Moran wrote:
 Have you benchmarked the load it creates under your workload?

 Yes, it takes up to 15% of our workload in an average use case. But we have 
 peak times where we can not afford 15% lost for logging! 

Well, you could turn it off during the peak times.  Or buy better
hardware --- if you have less than 15% headroom then it's past time
to be doing that anyway.

You really haven't given any convincing case for this behavior...

regards, tom lane

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] suggestion: log_statement = sample

2009-07-21 Thread Janning Vygen
On Tuesday 21 July 2009 15:49:36 Tom Lane wrote:
 Janning Vygen vy...@kicktipp.de writes:
  On Monday 20 July 2009 19:24:13 Bill Moran wrote:
  Have you benchmarked the load it creates under your workload?
 
  Yes, it takes up to 15% of our workload in an average use case. But we
  have peak times where we can not afford 15% lost for logging!

 Well, you could turn it off during the peak times. 

It affords a server restart which is not a good idea. And i will never get any 
real time analyze of what happens at peak time.

 Or buy better
 hardware --- if you have less than 15% headroom then it's past time
 to be doing that anyway.

yes, you are right, but software solutions are so much nicer...

 You really haven't given any convincing case for this behavior...

that's fine more me. I still don't understand it why it was not convincing. I 
guess you guys run your DB with lots of hardware power, RAIDs and NAS and so 
on. But there are many people running it on small boxes with one IDE hard 
disk. Switching full logging on with such hardware is like shutting it down as 
IO is blocked! I still think it has reasonable uses cases without lots of 
effort (just a guess, i havn't looked at the source code yet).

But, hey: I just wanted to help improving postgresql. If you think this is not 
an improvment, its fine for me. Thanks to all for discussing it.

kind regards
Janning


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] suggestion: log_statement = sample

2009-07-21 Thread Tom Lane
Janning Vygen vy...@kicktipp.de writes:
 On Tuesday 21 July 2009 15:49:36 Tom Lane wrote:
 Well, you could turn it off during the peak times. 

 It affords a server restart which is not a good idea.

Changing logging options does not require a server restart.

regards, tom lane

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] suggestion: log_statement = sample

2009-07-21 Thread Merlin Moncure
On Mon, Jul 20, 2009 at 1:28 PM, Joshua D. Drakej...@commandprompt.com wrote:
 On Mon, 2009-07-20 at 13:24 -0400, Bill Moran wrote:
   In actual practice, full query logging
 is 1/50 the amount of disk I/O as the actual database activity.  If your
 systems are so stressed that they can't handle another 2% increase, then
 you've got bigger problems lurking.

 It depends on the system. I have seen even big systems take a huge hit
 by full logging due to transactional velocity.

 Joshua D. Drake

I've seen this too.  On high transaction load systems, log_statement
has big overhead.  I haven't tested this lately though, and I suspect
this information is version dependent (in the 'old' days, it used to
be a really big deal).

merlin

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] suggestion: log_statement = sample

2009-07-21 Thread Janning Vygen
On Tuesday 21 July 2009 18:09:57 you wrote:
 Janning Vygen vy...@kicktipp.de writes:
  On Tuesday 21 July 2009 15:49:36 Tom Lane wrote:
  Well, you could turn it off during the peak times.
 
  It affords a server restart which is not a good idea.

 Changing logging options does not require a server restart.

great. I didn't know that. thank you!

Janning



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] suggestion: log_statement = sample

2009-07-20 Thread Greg Sabino Mullane

-BEGIN PGP SIGNED MESSAGE-   
Hash: RIPEMD160  


 In my experience, I've found that enabling full logging for a short time
 (perhaps a few hours) gathers enough data to run through tools like 
 pgFouine and find problem areas.

 It is not possible for us. Logging millions of statements take too much time.

Too much time to do what? Where is the bottleneck?

 But you dont run the real use cases with automated tests. There so many 
 factors involved in real time: caching, concurrency, data, peaktime,
 deadlocks, doubleclicks, robots etc. that you just can't reproduce it on a 
 development system without lots of effort. 

Agreed.

 Additionally, we make
 sure our production systems have enough hardware behind them that we can
 add additional tasks without it affecting production use.   

 that's nice, but not everybody can afford it. Of course i would love to log 
 every statement. But do you really log every statement in production? I guess 
 not.  

We have clients that do, and we always recommend it for all of our clients. It
can get very expensive (hint: ship them off with syslog over an internal
network to a dedicated box), but being able to see exactly what happened
in your database when something goes wrong after the fact is invaluable.
Plus we can then run tools like pgsi (http://bucardo.org/pgsi/) to get
nice statistics and find weak spots in the application code.

 Again: for my use case it makes sense to have a log_sample feature.

Perhaps, but I don't think you've quite overcome the 'log everything'
counter-argument. I don't think log_sample is necessarily a bad idea, but I
doubt it is going to happen soon, and certainly not in time to put
on your production system in the next year or so. Perhaps you can attack
this from another angle, such as using some sort of filter or script
to grab every X lines from the logs and discard the rest while in
full logging mode. That would have the advantage of being fully
backwards-compatible and usable today.


- --
Greg Sabino Mullane g...@turnstep.com
End Point Corporation
PGP Key: 0x14964AC8 200907201256
http://biglumber.com/x/web?pk=2529DF6AB8F79407E94445B4BC9B906714964AC8
-BEGIN PGP SIGNATURE-

iEYEAREDAAYFAkpkoh4ACgkQvJuQZxSWSsi5swCfSgxE/3Vs+kCmfqSERL6u84XJ
nNQAoO5oOs/Nwuhe27FQ+THZEUVcdULO
=7JPQ
-END PGP SIGNATURE-



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] suggestion: log_statement = sample

2009-07-20 Thread Bill Moran
In response to Greg Sabino Mullane g...@turnstep.com:
 
 -BEGIN PGP SIGNED MESSAGE-   
 Hash: RIPEMD160  
 
 
  In my experience, I've found that enabling full logging for a short time
  (perhaps a few hours) gathers enough data to run through tools like 
  pgFouine and find problem areas.
 
  It is not possible for us. Logging millions of statements take too much 
  time.

This is a ridiculous statement.  In actual practice, full query logging
is 1/50 the amount of disk I/O as the actual database activity.  If your
systems are so stressed that they can't handle another 2% increase, then
you've got bigger problems lurking.

Have you benchmarked the load it creates under your workload?

  But you dont run the real use cases with automated tests. There so many 
  factors involved in real time: caching, concurrency, data, peaktime,
  deadlocks, doubleclicks, robots etc. that you just can't reproduce it on a 
  development system without lots of effort. 

Logging a sample query every arbitrary number of queries isn't a real use
case either, thus your counter argument makes no sense.

I suggested the test system because it's a good compromise.

  Additionally, we make
  sure our production systems have enough hardware behind them that we can
  add additional tasks without it affecting production use.   
 
  that's nice, but not everybody can afford it. Of course i would love to log
  every statement. But do you really log every statement in production? I 
  guess
  not.

Try reading my responses instead of guessing.  We run full query logging on
occasion to get a sample of real usage on our production systems.  As a
result, we know that our servers can handle the load.  We're working on
the infrastructure to manage the amount of data so we can do it all the
time (we don't currently have enough disk space).

Overall, it seems like you've decided that you want this feature and nothing
else will do.  If that's the case, then just go ahead and write it.

-- 
Bill Moran
http://www.potentialtech.com
http://people.collaborativefusion.com/~wmoran/

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] suggestion: log_statement = sample

2009-07-20 Thread Joshua D. Drake
On Mon, 2009-07-20 at 13:24 -0400, Bill Moran wrote:
 In response to Greg Sabino Mullane g...@turnstep.com:
  
  -BEGIN PGP SIGNED MESSAGE-   
  Hash: RIPEMD160  
  
  
   In my experience, I've found that enabling full logging for a short time
   (perhaps a few hours) gathers enough data to run through tools like 
   pgFouine and find problem areas.
  
   It is not possible for us. Logging millions of statements take too much 
   time.
 
 This is a ridiculous statement.

No it isn't.

   In actual practice, full query logging
 is 1/50 the amount of disk I/O as the actual database activity.  If your
 systems are so stressed that they can't handle another 2% increase, then
 you've got bigger problems lurking.

It depends on the system. I have seen even big systems take a huge hit
by full logging due to transactional velocity.

Joshua D. Drake

-- 
PostgreSQL - XMPP: jdr...@jabber.postgresql.org
   Consulting, Development, Support, Training
   503-667-4564 - http://www.commandprompt.com/
   The PostgreSQL Company, serving since 1997


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] suggestion: log_statement = sample

2009-07-20 Thread Steve Atkins


On Jul 20, 2009, at 10:24 AM, Bill Moran wrote:


In response to Greg Sabino Mullane g...@turnstep.com:


-BEGIN PGP SIGNED MESSAGE-
Hash: RIPEMD160


In my experience, I've found that enabling full logging for a  
short time

(perhaps a few hours) gathers enough data to run through tools like
pgFouine and find problem areas.


It is not possible for us. Logging millions of statements take too  
much time.


This is a ridiculous statement.


No, it isn't.


 In actual practice, full query logging
is 1/50 the amount of disk I/O as the actual database activity.  If  
your
systems are so stressed that they can't handle another 2% increase,  
then

you've got bigger problems lurking.


That's really not true. Ever, probably, but certainly in my situation.

A lot of my inserts are large text fields (10k - 1M). Many of those  
are never

read again.

If I want to log all statements or slow statements then a lot of
what's logged is those large inserts. There's currently no way
to log the statement without logging all the data inserted, so
the log traffic produced by each insert is huge.

Logging via syslog on Solaris I've had reports of that slowing
the machine down to unusability. (I'm fairly sure I know why,
and I suspect you can guess too, but this is on customer machines).
The hit due to logging can be huge, even on fairly overpowered
systems.

There are a lot of limitations in the current logging system when
it comes to capturing data for performance analysis, and that's
one of them. There's certainly significant room for improvement
in the logging system - some of that can be added externally,
but some of it really needs to be done within the core code.

Cheers,
  Steve


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] suggestion: log_statement = sample

2009-07-20 Thread Bill Moran
In response to Joshua D. Drake j...@commandprompt.com:

 On Mon, 2009-07-20 at 13:24 -0400, Bill Moran wrote:
  In response to Greg Sabino Mullane g...@turnstep.com:
   
   -BEGIN PGP SIGNED MESSAGE-   
   Hash: RIPEMD160  
   
   
In my experience, I've found that enabling full logging for a short 
time
(perhaps a few hours) gathers enough data to run through tools like
 
pgFouine and find problem areas.   
 
   
It is not possible for us. Logging millions of statements take too much 
time.
  
  This is a ridiculous statement.
 
 No it isn't.
 
In actual practice, full query logging
  is 1/50 the amount of disk I/O as the actual database activity.  If your
  systems are so stressed that they can't handle another 2% increase, then
  you've got bigger problems lurking.
 
 It depends on the system. I have seen even big systems take a huge hit
 by full logging due to transactional velocity.

Perhaps I'm just in a foul mood today, but I feel like people are picking
my emails apart to make me the bad guy.  Note that you trimmed out a key
part of what I said here:

Have you benchmarked the load it creates under your workload?

Perhaps I should have explicitly said, There are likely some cases where
this statement isn't true, so you should benchmark your specific load
case, but you don't mention that you've done that.

Or, perhaps I should just abstain from posting to mailing lists when I'm
in a foul mood ...

-- 
Bill Moran
http://www.potentialtech.com
http://people.collaborativefusion.com/~wmoran/

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] suggestion: log_statement = sample

2009-07-20 Thread Joshua D. Drake
On Mon, 2009-07-20 at 13:46 -0400, Bill Moran wrote:
 In response to Joshua D. Drake j...@commandprompt.com:

  It depends on the system. I have seen even big systems take a huge hit
  by full logging due to transactional velocity.
 
 Perhaps I'm just in a foul mood today, but I feel like people are picking

Possibly. :)

 my emails apart to make me the bad guy.  Note that you trimmed out a key
 part of what I said here:

Right. I didn't have issue with the rest of your email. Its just the
idea what the person said was ridiculous.

 
 Have you benchmarked the load it creates under your workload?

Which is certainly a valid question.

 
 Perhaps I should have explicitly said, There are likely some cases where
 this statement isn't true, so you should benchmark your specific load
 case, but you don't mention that you've done that.
 
 Or, perhaps I should just abstain from posting to mailing lists when I'm
 in a foul mood ...

It has certainly gotten me in trouble more than once. There is a reason
they say, Don't go JD.

Joshua D. Drake




 
-- 
PostgreSQL - XMPP: jdr...@jabber.postgresql.org
   Consulting, Development, Support, Training
   503-667-4564 - http://www.commandprompt.com/
   The PostgreSQL Company, serving since 1997


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] suggestion: log_statement = sample

2009-07-16 Thread Bill Moran
In response to Janning Vygen vy...@kicktipp.de:

 hi,
 
 http://archives.postgresql.org/pgsql-general/2009-03/msg00581.php
 
 This was my suggestion about introducing a statment to get a sample of SQL 
 statements. Nobody answered yet. Why not? i think my suggestion would help a 
 lot. Or was it kind of stupid?

For my part, I don't think this would be useful.

Since most of your queries are run by software, you're going to see a
fairly predictable pattern to the queries, which means your sampling isn't
going to be anywhere near random, thus it will still be inaccurate and
incomplete.

In my experience, I've found that enabling full logging for a short time
(perhaps a few hours) gathers enough data to run through tools like
pgFouine and find problem areas.  Also, we have development servers that
run automated tests, and since it's not critical that they be performant,
we can run full query logging on them all the time.  Additionally, we make
sure our production systems have enough hardware behind them that we can
add additional tasks without it affecting production use.

All of these are (in my opinion) better approaches to the problem than
yet another arbitrary query filtering technique.  I mean, logging only
the most time-consuming queries is already arbitrary enough (as you
already stated).

-- 
Bill Moran
http://www.potentialtech.com
http://people.collaborativefusion.com/~wmoran/

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] suggestion: log_statement = sample

2009-07-16 Thread Janning Vygen
hi,

thanks for your comments on this.

On Thursday 16 July 2009 15:05:58 you wrote:
 In response to Janning Vygen vy...@kicktipp.de:
  hi,
 
  http://archives.postgresql.org/pgsql-general/2009-03/msg00581.php
 
  This was my suggestion about introducing a statment to get a sample of
  SQL statements. Nobody answered yet. Why not? i think my suggestion would
  help a lot. Or was it kind of stupid?

 For my part, I don't think this would be useful.

 Since most of your queries are run by software, you're going to see a
 fairly predictable pattern to the queries, which means your sampling isn't
 going to be anywhere near random, thus it will still be inaccurate and
 incomplete.

I dont think so. In my use case i will get a good sampling of queries as I 
could keep my log_sample running over long period of time. The sampling is in 
any case much better than with log_minduration while logging all statement is 
not acceptable in production.

 In my experience, I've found that enabling full logging for a short time
 (perhaps a few hours) gathers enough data to run through tools like
 pgFouine and find problem areas. 

It is not possible for us. Logging millions of statements take too much time.

 Also, we have development servers that
 run automated tests, and since it's not critical that they be performant,
 we can run full query logging on them all the time. 

But you dont run the real use cases with automated tests. There so many 
factors involved in real time: caching, concurrency, data, peaktime, 
deadlocks, doubleclicks, robots etc. that you just can't reproduce it on a 
development system without lots of effort.

 Additionally, we make
 sure our production systems have enough hardware behind them that we can
 add additional tasks without it affecting production use.

that's nice, but not everybody can afford it. Of course i would love to log 
every statement. But do you really log every statement in production? I guess 
not. 

 All of these are (in my opinion) better approaches to the problem than
 yet another arbitrary query filtering technique.  I mean, logging only
 the most time-consuming queries is already arbitrary enough (as you
 already stated).

With log_min duration i get only most time-consuming queries.
With log sample i can detect if there is a fast query which is called to 
often. This is impossible today.

Again: for my use case it makes sense to have a log_sample feature.

kind regards
Janning


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general