[PERFORM] PostgreSQL caching
Hello, I have the following problem: When I run some query after I just run the Postmaster, it takse several seconds to execute (sometimes more than 10), if I rerun it again afterwards, it takes mere milliseconds. So, I guess it has to do with PostgreSQL caching.. But how exactly does it work? What does it cache? And how can I control it? I would like to load selected information in the memory before a user runs the query. Can I do it somehow? As PostgreSQL is used in my case as webserver, it isn't really helping if the user has to wait 10 seconds every time he goes to a new page (even if refreshing the page would be really quick, sine Postgre already loaded the data to memory). P.S If the query or its EXPLAIN are critical for a better understanding, let me know. Regards, Vitaly Belman ICQ: 1912453 AIM: VitalyB1984 MSN: [EMAIL PROTECTED] Yahoo!: VitalyBe ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [PERFORM] PostgreSQL caching
while you weren't looking, Vitaly Belman wrote: So, I guess it has to do with PostgreSQL caching.. But how exactly does it work? What does it cache? And how can I control it? PostgreSQL uses the operating system's disk cache. You can hint to the postmaster how much memory is available for caching with the effective_cache_size directive in your postgresql.conf. If you're running a *nix OS, you can find this by watching `top` for a while; in the header, there's a cached value (or something to that effect). Watching this value, you can determine a rough average and set your effective_cache_size to that rough average, or perhaps slightly less. I'm not sure how to get this value on Windows. Pgsql uses the OS's disk cache instead of its own cache management because the former is more likely to persist. If the postmaster managed the cache, as soon as the last connection died, the memory allocated for caching would be released, and all the cached data would be lost. Relying instead on the OS to cache data means that, whether or not there's a postmaster, so long as there has been one, there'll be some data cached. You can prepopulate the OS disk cache by periodically running a handful of SELECT queries that pull from your most commonly accessed tables in a background process. (A good way of doing that is simply to run your most commonly executed SELECTS.) Those queries should take the performance hit of fetching from disk, while your regular queries hit the cache. /rls -- Rosser Schwarz Total Card, Inc. ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] PostgreSQL caching
Vitaly Belman wrote: Hello, I have the following problem: When I run some query after I just run the Postmaster, it takse several seconds to execute (sometimes more than 10), if I rerun it again afterwards, it takes mere milliseconds. So, I guess it has to do with PostgreSQL caching.. But how exactly does it work? What does it cache? And how can I control it? There are two areas of cache - PostgreSQL's shared buffers and the operating system's disk-cache. You can't directly control what data is cached, it just keeps track of recently used data. It sounds like PG isn't being used for a while so your OS decides to use its cache for webserver files. I would like to load selected information in the memory before a user runs the query. Can I do it somehow? As PostgreSQL is used in my case as webserver, it isn't really helping if the user has to wait 10 seconds every time he goes to a new page (even if refreshing the page would be really quick, sine Postgre already loaded the data to memory). If you could pin data in the cache it would run quicker, but at the cost of everything else running slower. Suggested steps: 1. Read the configuration/tuning guide at: http://www.varlena.com/varlena/GeneralBits/Tidbits/index.php 2. Post a sample query/explain analyse that runs very slowly when not cached. 3. If needs be, you can write a simple timed script that performs a query. Or, the autovacuum daemon might be what you want. -- Richard Huxton Archonet Ltd ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [PERFORM] PostgreSQL caching
[EMAIL PROTECTED] (Richard Huxton) writes: If you could pin data in the cache it would run quicker, but at the cost of everything else running slower. Suggested steps: 1. Read the configuration/tuning guide at: http://www.varlena.com/varlena/GeneralBits/Tidbits/index.php 2. Post a sample query/explain analyse that runs very slowly when not cached. 3. If needs be, you can write a simple timed script that performs a query. Or, the autovacuum daemon might be what you want. I don't think this case will be anywhere near so simple to resolve. I have seen this phenomenon occur when a query needs to pull a moderate number of blocks into memory to satisfy a query that involves some moderate number of rows. Let's say you need 2000 rows, which fit into 400 blocks. The first time the query runs, it needs to pull those 400 blocks off disk, which requires 400 reads of 8K of data. That can easily take a few seconds of I/O. The second time, not only are those blocks cached, they are probably cached in the buffer cache, so that the I/O overhead disappears. There's very likely no problem with the table statistics; they are leading to the right query plan, which happens to need to do 5 seconds of I/O to pull the data into memory. What is essentially required is the prescient cacheing algorithm, where the postmaster must consult /dev/esp in order to get a prediction of what blocks it may need to refer to in the next sixty seconds. -- (format nil [EMAIL PROTECTED] cbbrowne cbbrowne.com) http://cbbrowne.com/info/linuxdistributions.html Normally, we don't do people's homework around here, but Venice is a very beautiful city, so I'll make a small exception. --- Robert Redelmeier compromises his principles ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [PERFORM] PostgreSQL caching
What is essentially required is the prescient cacheing algorithm, where the postmaster must consult /dev/esp in order to get a prediction of what blocks it may need to refer to in the next sixty seconds. Easy enough. Television does it all the time with live shows. The guy with the buzzer always seems to know what will be said before they say it. All we need is a 5 to 10 second delay... ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
[PERFORM] Avoiding vacuum full on an UPDATE-heavy table
All, I have a particularly troublesome table in my 7.3.4 database. It typically has less than 50k rows, and a usage pattern of about 1k INSERTs, 50-100k UPDATEs, and no DELETEs per day. It is vacuumed and analyzed three times per week. However, the performance of queries performed on this table slowly degrades over a period of weeks, until even a select count(*) takes several seconds. The only way I've found to restore performance is to VACUUM FULL the table, which is highly undesireable in our application due to the locks it imposes. Here is the output of a psql session demonstrating the problem/solution. Note the \timing output after each of the SELECTs: = vacuum analyze ; NOTICE: VACUUM will be committed automatically VACUUM Time: 715900.74 ms = select count(*) from ; count --- 17978 (1 row) Time: 171789.08 ms = vacuum full verbose ; NOTICE: VACUUM will be committed automatically INFO: --Relation public.-- INFO: Pages 188903: Changed 60, reaped 188896, Empty 0, New 0; Tup 17987: Vac 1469, Keep/VTL 0/0, UnUsed 9120184, MinLen 92, MaxLen 468; Re-using: Free/Avail. Space 1504083956/1504083872; EndEmpty/Avail. Pages 0/188901. CPU 6.23s/1.07u sec elapsed 55.02 sec. INFO: Index __idx: Pages 29296; Tuples 17987: Deleted 1469. CPU 1.08s/0.20u sec elapsed 61.68 sec. INFO: Index __idx: Pages 18412; Tuples 17987: Deleted 1469. CPU 0.67s/0.05u sec elapsed 17.90 sec. INFO: Rel : Pages: 188903 -- 393; Tuple(s) moved: 17985. CPU 15.97s/19.11u sec elapsed 384.49 sec. INFO: Index __idx: Pages 29326; Tuples 17987: Deleted 17985. CPU 1.14s/0.65u sec elapsed 32.34 sec. INFO: Index __idx: Pages 18412; Tuples 17987: Deleted 17985. CPU 0.43s/0.32u sec elapsed 13.37 sec. VACUUM Time: 566313.54 ms = select count(*) from ; count --- 17987 (1 row) Time: 22.82 ms Is there any way to avoid doing a periodic VACUUM FULL on this table, given the fairly radical usage pattern? Or is the (ugly) answer to redesign our application to avoid this usage pattern? Also, how do I read the output of VACUUM FULL? http://www.postgresql.org/docs/7.3/interactive/sql-vacuum.html does not explain how to interpret the output, nor has google helped. I have a feeling that the full vacuum is compressing hundreds of thousands of pages of sparse data into tens of thousands of pages of dense data, thus reducing the number of block reads by an order of magnitude, but I'm not quite sure how to read the output. FWIW, this is last night's relevant output from the scheduled VACUUM ANALYZE. 24 days have passed since the VACUUM FULL above: INFO: --Relation public.xxx-- INFO: Index xxx_yyy_idx: Pages 30427; Tuples 34545: Deleted 77066. CPU 1.88s/0.51u sec elapsed 95.39 sec. INFO: Index xxx_zzz_idx: Pages 19049; Tuples 34571: Deleted 77066. CPU 0.83s/0.40u sec elapsed 27.92 sec. INFO: Removed 77066 tuples in 3474 pages. CPU 0.38s/0.32u sec elapsed 1.33 sec. INFO: Pages 13295: Changed 276, Empty 0; Tup 34540: Vac 77066, Keep 0, UnUsed 474020. Total CPU 3.34s/1.29u sec elapsed 125.00 sec. INFO: Analyzing public.xxx Best Regards, Bill Montgomery ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [PERFORM] PostgreSQL caching
Rosser Schwarz wrote: PostgreSQL uses the operating system's disk cache. ... in addition to its own buffer cache, which is stored in shared memory. You're correct though, in that the best practice is to keep the PostgreSQL cache small and give more memory to the operating system's disk cache. Pgsql uses the OS's disk cache instead of its own cache management because the former is more likely to persist. If the postmaster managed the cache, as soon as the last connection died, the memory allocated for caching would be released, and all the cached data would be lost. No; the cache is stored in shared memory. It wouldn't persist over postmaster restarts (without some scheme of saving and restoring it), but that has nothing to do with why the OS disk cache is usually kept larger than the PG shared buffer cache. -Neil ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Avoiding vacuum full on an UPDATE-heavy table
Is there any way to avoid doing a periodic VACUUM FULL on this table, given the fairly radical usage pattern? Or is the (ugly) answer to redesign our application to avoid this usage pattern? Yes, you should be able to doing avoid periodic VACUUM FULL. The problem is that your table needs to be vacuumed MUCH more often. What should happen is that assuming you have enough FSM space allocated and assuming you vacuum the right amount, your table will reach a steady state size. As you could see your from you vacumm verbose output your table was almost entriely dead space. pg_autovacuum would probably help as it monitors activity and vacuumus tables accordingly. It is not included with 7.3.x but if you download it and compile yourself it will work against a 7.3.x server. Good luck, Matthew ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [PERFORM] PostgreSQL caching
Not knowing a whole lot about the internals of Pg, one thing jumped out at me, that each trip to get data from bv_books took 2.137 ms, which came to over 4.2 seconds right there. The problem seems to be the 1993 times that the nested loop spins, as almost all of the time is spent there. Personally, I am amazed that it takes 3.585 seconds to index scan i_bookgenres_genre_id. Is that a composite index? Analyzing the taables may help, as the optimizer appears to mispredict the number of rows returned. I would be curious to see how it performs with an IN clause, which I would suspect would go quite a bit fasrer. Try the following: SELECT bv_books. * , vote_avg, vote_count FROM bv_bookgenres, bv_books WHERE bv_books.book_id IN ( SELECT book_id FROM bv_genres WHERE bv_bookgenres.genre_id = 5830 ) AND bv_bookgenres.genre_id = 5830 ORDER BY vote_avg DESC LIMIT 10 OFFSET 0; In this query, all of the book_id values are pulled at once. Who knows? If you get statisctics on this, please post. Marty ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [PERFORM] Avoiding vacuum full on an UPDATE-heavy table
BM == Bill Montgomery [EMAIL PROTECTED] writes: BM Is there any way to avoid doing a periodic VACUUM FULL on this table, BM given the fairly radical usage pattern? Or is the (ugly) answer to BM redesign our application to avoid this usage pattern? I'll bet upgrading to 7.4.2 clears up your problems. I'm not sure if it was in 7.3 or 7.4 where the index bloat problem was solved. Try to see if just reindexing will help your performance. Also, run a plain vacuum at least nightly so that your table size stays reasonable. It won't take much time on a table with only 50k rows in it. -- =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= Vivek Khera, Ph.D.Khera Communications, Inc. Internet: [EMAIL PROTECTED] Rockville, MD +1-301-869-4449 x806 AIM: vivekkhera Y!: vivek_khera http://www.khera.org/~vivek/ ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Avoiding vacuum full on an UPDATE-heavy table
Matthew T. O'Connor wrote: Is there any way to avoid doing a periodic VACUUM FULL on this table, given the fairly radical usage pattern? Or is the (ugly) answer to redesign our application to avoid this usage pattern? pg_autovacuum would probably help as it monitors activity and vacuumus tables accordingly. It is not included with 7.3.x but if you download it and compile yourself it will work against a 7.3.x server. As a quick fix, since we're upgrading to 7.4.2 in a few weeks anyhow (which includes pg_autovacuum), I've simply set up an hourly vacuum on this table. It only takes ~4 seconds to execute when kept up on an hourly basis. Is there any penalty to vacuuming too frequently, other than the time wasted in an unnecessary vacuum operation? My hourly VACUUM VERBOSE output now looks like this: INFO: --Relation public.-- INFO: Index __idx: Pages 30452; Tuples 34990: Deleted 1226. CPU 0.67s/0.18u sec elapsed 0.87 sec. INFO: Index __idx: Pages 19054; Tuples 34991: Deleted 1226. CPU 0.51s/0.13u sec elapsed 1.35 sec. INFO: Removed 1226 tuples in 137 pages. CPU 0.01s/0.00u sec elapsed 1.30 sec. INFO: Pages 13709: Changed 31, Empty 0; Tup 34990: Vac 1226, Keep 0, UnUsed 567233. Total CPU 1.58s/0.31u sec elapsed 3.91 sec. INFO: Analyzing public. VACUUM With regards to Vivek's post about index bloat, I tried REINDEXing before I did a VACUUM FULL a month ago when performance had gotten dismal. It didn't help :-( Best Regards, Bill Montgomery ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [PERFORM] Avoiding vacuum full on an UPDATE-heavy table
Bill Montgomery [EMAIL PROTECTED] writes: I have a particularly troublesome table in my 7.3.4 database. It typically has less than 50k rows, and a usage pattern of about 1k INSERTs, 50-100k UPDATEs, and no DELETEs per day. It is vacuumed and analyzed three times per week. You probably want to vacuum (non-FULL) once a day, if not more often. Also take a look at your FSM settings --- it seems like a good bet that they're not large enough to remember all the free space in your database. With adequate FSM the table should stabilize at a physical size corresponding to number-of-live-rows + number-of-updates-between-VACUUMs, which would be three times the minimum possible size if you vacuum once a day (50K + 100K) or five times if you stick to every-other-day (50K + 200K). Your VACUUM FULL output shows that the table had bloated to hundreds of times the minimum size: INFO: Rel : Pages: 188903 -- 393; Tuple(s) moved: 17985. and AFAIK the only way that will happen is if you fail to vacuum at all or don't have enough FSM. The indexes are looking darn large as well. In 7.3 about the only thing you can do about this is REINDEX the table every so often. 7.4 should behave better though. regards, tom lane ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
[PERFORM] tuning for AIX 5L with large memory
I will soon have at my disposal a new IBM pSeries server. The main mission for this box will be to serve several pg databases. I have ordered 8GB of RAM and want to learn the best way to tune pg and AIX for this configuration. Specifically, I am curious about shared memory limitations. I've had to tune the shmmax on linux machines before but I'm new to AIX and not sure if this is even required on that platform? Google has not been much help for specifics here. Hoping someone else here has a similar platform and can offer some advice.. Thanks! -Dan Harris ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] tuning for AIX 5L with large memory
Clinging to sanity, [EMAIL PROTECTED] (Dan Harris) mumbled into her beard: I will soon have at my disposal a new IBM pSeries server. The main mission for this box will be to serve several pg databases. I have ordered 8GB of RAM and want to learn the best way to tune pg and AIX for this configuration. Specifically, I am curious about shared memory limitations. I've had to tune the shmmax on linux machines before but I'm new to AIX and not sure if this is even required on that platform? Google has not been much help for specifics here. Hoping someone else here has a similar platform and can offer some advice.. We have a couple of these at work; they're nice and fast, although the process of compiling things, well, makes me feel a little unclean. One of our sysadmins did all the configuring OS stuff part; I don't recall offhand if there was a need to twiddle something in order to get it to have great gobs of shared memory. A quick Google on this gives me the impression that AIX supports, out of the box, multiple GB of shared memory without special kernel configuration. A DB/2 configuration guide tells users of Solaris and HP/UX that they need to set shmmax in sundry config files and reboot. No such instruction for AIX. If it needs configuring, it's probably somewhere in SMIT. And you can always try starting up an instance to see how big it'll let you make shared memory. The usual rule of thumb has been that having substantially more than 1 blocks worth of shared memory is unworthwhile. I don't think anyone has done a detailed study on AIX to see if bigger numbers play well or not. I would think that having more than about 1 to 1.5GB of shared memory in use for buffer cache would start playing badly, but I have no numbers. -- select 'cbbrowne' || '@' || 'cbbrowne.com'; http://www3.sympatico.ca/cbbrowne/sap.html Would-be National Mottos: USA: We don't care where you come from. We can't find our *own* country on a map... ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
Re: [PERFORM] tuning for AIX 5L with large memory
Christopher Browne wrote: One of our sysadmins did all the configuring OS stuff part; I don't recall offhand if there was a need to twiddle something in order to get it to have great gobs of shared memory. FWIW, the section on configuring kernel resources under various Unixen[1] doesn't have any documentation for AIX. If someone out there knows which knobs need to be tweaked, would they mind sending in a doc patch? (Or just specifying what needs to be done, and I'll add the SGML.) -Neil [1] http://developer.postgresql.org/docs/postgres/kernel-resources.html#SYSVIPC ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]