On Mon, Feb 25, 2013 at 8:26 PM, Robert Haas <robertmh...@gmail.com> wrote: > On Sun, Feb 24, 2013 at 7:27 PM, Jim Nasby <j...@nasby.net> wrote: >> We actually do that in our application and have discovered that random >> sampling can end up significantly skewing your data. > > /me blinks. > > How so?
Sampling is a pretty big area of statistics. There are dozens of sampling methods to deal with various problems that occur with different types of data distributions. One problem is if you have some very rare events then random sampling can produce odd results since those rare events will drop out entirely unless your sample is very large whereas less rare events are represented proportionally. There are sampling methods that ensure that x% of the rare events are included even if those rare events are less than x% of your total data set. One of those might be appropriate to use for profiling data when you're looking for rare slow queries amongst many faster queries. -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers