Joshua D. Drake wrote:
postgres=# analyze verbose test_ten_million;
INFO:  analyzing "public.test_ten_million"
INFO:  "test_ten_million": scanned 3000 of 44248 pages, containing 678000
live rows and 0 dead rows; 3000 rows in sample, 10000048 estimated total
rows
ANALYZE
Time: 20145.148 ms

At an ever larger table sizes, this would turn into 3000 random seeks all over the drive, one at a time because there's no async I/O here to queue requests better than that for this access pattern. Let's say they take 10ms each, not an unrealistic amount of time on current hardware. That's 30 seconds, best case, which is similar to what JD's example is showing even on a pretty small data set. Under load it could easily take over a minute, hammering the disks the whole time, and in a TOAST situation you're doing even more work. It's not outrageous and it doesn't scale linearly with table size, but it's not something you want to happen any more than you have to either--consider the poor client who is trying to get their work done while that is going on.

On smaller tables, you're both more likely to grab a useful next page via readahead, and to just have the data you need cached in RAM already. There's a couple of "shelves" in the response time to finish ANALYZE as you exceed L1/L2 CPU cache size and RAM size, then it trails downward as the seeks get longer and longer once the data you need is spread further across the disk(s). That the logical beginning of a drive is much faster than the logical end doesn't help either. I should generate that graph again one day somewhere I can release it at...

--
Greg Smith    2ndQuadrant   Baltimore, MD
PostgreSQL Training, Services and Support
g...@2ndquadrant.com  www.2ndQuadrant.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to