Tom Lane <[EMAIL PROTECTED]> writes: > Greg Stark <[EMAIL PROTECTED]> writes: > > And a 5% sample is a pretty big. In fact my tests earlier showed the i/o > > from > > 5% block sampling took just as long as reading all the blocks. Even if we > > figure out what's causing that (IMHO surprising) result and improve matters > > I > > would only expect it to be 3-4x faster than a full scan. > > One way to reduce the I/O pain from extensive sampling would be to turn > VACUUM ANALYZE into a genuine combined operation instead of a mere > notational shorthand for two separate scans.
Except that we're also looking for every way we can to avoid having vacuum have to touch every page so large tables with narrow hot spots can still afford to be vacuumed regularly during peak hours. But for most users analyze doesn't really have to run as often as vacuum. One sequential scan per night doesn't seem like that big a deal to me. > I'd still be worried about the CPU pain though. ANALYZE can afford to > expend a pretty fair number of cycles per sampled tuple, but with a > whole-table sample that's going to add up. That is a concern. Though the algorithm is pretty efficient, it basically amounts to hashing all the tuples keeping only a limited number of hash buckets and just throwing away the rest of the data. Things could get more complicated if we get to more complex stats than simple n_distinct estimates and performing dependence tests on the data for multi-column statistics. -- greg ---------------------------(end of broadcast)--------------------------- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly