[PERFORM] PostgreSQL caching

2004-05-21 Thread Vitaly Belman
Hello,

I have the following problem:

When I run some query after I just run the Postmaster, it takse
several seconds to execute (sometimes more than 10), if I rerun it
again afterwards, it takes mere milliseconds.

So, I guess it has to do with PostgreSQL caching.. But how exactly
does it work? What does it cache? And how can I control it?

I would like to load selected information in the memory before a user
runs the query. Can I do it somehow? As PostgreSQL is used in my case
as webserver, it isn't really helping if the user has to wait 10
seconds every time he goes to a new page (even if refreshing the page
would be really quick, sine Postgre already loaded the data to
memory).

P.S If the query or its EXPLAIN are critical for a better
understanding, let me know.

Regards,
 Vitaly Belman
 
 ICQ: 1912453
 AIM: VitalyB1984
 MSN: [EMAIL PROTECTED]
 Yahoo!: VitalyBe


---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] PostgreSQL caching

2004-05-21 Thread Rosser Schwarz
while you weren't looking, Vitaly Belman wrote:

 So, I guess it has to do with PostgreSQL caching.. But how exactly
 does it work? What does it cache? And how can I control it?

PostgreSQL uses the operating system's disk cache.  You can hint to
the postmaster how much memory is available for caching with the
effective_cache_size directive in your postgresql.conf.  If you're
running a *nix OS, you can find this by watching `top` for a while;
in the header, there's a cached value (or something to that effect).
Watching this value, you can determine a rough average and set your
effective_cache_size to that rough average, or perhaps slightly less.
I'm not sure how to get this value on Windows.

Pgsql uses the OS's disk cache instead of its own cache management
because the former is more likely to persist.  If the postmaster
managed the cache, as soon as the last connection died, the memory
allocated for caching would be released, and all the cached data
would be lost.  Relying instead on the OS to cache data means that,
whether or not there's a postmaster, so long as there has been one,
there'll be some data cached.

You can prepopulate the OS disk cache by periodically running a
handful of SELECT queries that pull from your most commonly accessed
tables in a background process.  (A good way of doing that is simply
to run your most commonly executed SELECTS.)  Those queries should
take the performance hit of fetching from disk, while your regular
queries hit the cache.

/rls

--
Rosser Schwarz
Total Card, Inc.


---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] PostgreSQL caching

2004-05-21 Thread Richard Huxton
Vitaly Belman wrote:
Hello,
I have the following problem:
When I run some query after I just run the Postmaster, it takse
several seconds to execute (sometimes more than 10), if I rerun it
again afterwards, it takes mere milliseconds.
So, I guess it has to do with PostgreSQL caching.. But how exactly
does it work? What does it cache? And how can I control it?
There are two areas of cache - PostgreSQL's shared buffers and the 
operating system's disk-cache. You can't directly control what data is 
cached, it just keeps track of recently used data. It sounds like PG 
isn't being used for a while so your OS decides to use its cache for 
webserver files.

I would like to load selected information in the memory before a user
runs the query. Can I do it somehow? As PostgreSQL is used in my case
as webserver, it isn't really helping if the user has to wait 10
seconds every time he goes to a new page (even if refreshing the page
would be really quick, sine Postgre already loaded the data to
memory).
If you could pin data in the cache it would run quicker, but at the 
cost of everything else running slower.

Suggested steps:
1. Read the configuration/tuning guide at:
  http://www.varlena.com/varlena/GeneralBits/Tidbits/index.php
2. Post a sample query/explain analyse that runs very slowly when not 
cached.
3. If needs be, you can write a simple timed script that performs a 
query. Or, the autovacuum daemon might be what you want.

--
  Richard Huxton
  Archonet Ltd
---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [PERFORM] PostgreSQL caching

2004-05-21 Thread Chris Browne
[EMAIL PROTECTED] (Richard Huxton) writes:
 If you could pin data in the cache it would run quicker, but at the
 cost of everything else running slower.

 Suggested steps:
 1. Read the configuration/tuning guide at:
http://www.varlena.com/varlena/GeneralBits/Tidbits/index.php
 2. Post a sample query/explain analyse that runs very slowly when not
 cached.
 3. If needs be, you can write a simple timed script that performs a
 query. Or, the autovacuum daemon might be what you want.

I don't think this case will be anywhere near so simple to resolve.

I have seen this phenomenon occur when a query needs to pull a
moderate number of blocks into memory to satisfy a query that involves
some moderate number of rows.

Let's say you need 2000 rows, which fit into 400 blocks.

The first time the query runs, it needs to pull those 400 blocks off
disk, which requires 400 reads of 8K of data.  That can easily take a
few seconds of I/O.

The second time, not only are those blocks cached, they are probably
cached in the buffer cache, so that the I/O overhead disappears.

There's very likely no problem with the table statistics; they are
leading to the right query plan, which happens to need to do 5 seconds
of I/O to pull the data into memory.

What is essentially required is the prescient cacheing algorithm,
where the postmaster must consult /dev/esp in order to get a
prediction of what blocks it may need to refer to in the next sixty
seconds.
-- 
(format nil [EMAIL PROTECTED] cbbrowne cbbrowne.com)
http://cbbrowne.com/info/linuxdistributions.html
Normally, we don't do people's homework around here, but Venice is a
very beautiful city, so I'll make a small exception.
--- Robert Redelmeier compromises his principles

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] PostgreSQL caching

2004-05-21 Thread Rod Taylor
 What is essentially required is the prescient cacheing algorithm,
 where the postmaster must consult /dev/esp in order to get a
 prediction of what blocks it may need to refer to in the next sixty
 seconds.

Easy enough. Television does it all the time with live shows. The guy
with the buzzer always seems to know what will be said before they say
it. All we need is a 5 to 10 second delay...


---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


[PERFORM] Avoiding vacuum full on an UPDATE-heavy table

2004-05-21 Thread Bill Montgomery
All,
I have a particularly troublesome table in my 7.3.4 database. It 
typically has less than 50k rows, and a usage pattern of about 1k 
INSERTs, 50-100k UPDATEs, and no DELETEs per day. It is vacuumed and 
analyzed three times per week. However, the performance of queries 
performed on this table slowly degrades over a period of weeks, until 
even a select count(*) takes several seconds. The only way I've found 
to restore performance is to VACUUM FULL the table, which is highly 
undesireable in our application due to the locks it imposes.

Here is the output of a psql session demonstrating the problem/solution. 
Note the \timing output after each of the SELECTs:

= vacuum analyze ;
NOTICE:  VACUUM will be committed automatically
VACUUM
Time: 715900.74 ms
= select count(*) from ;
count
---
17978
(1 row)
Time: 171789.08 ms
= vacuum full verbose ;
NOTICE:  VACUUM will be committed automatically
INFO:  --Relation public.--
INFO:  Pages 188903: Changed 60, reaped 188896, Empty 0, New 0; Tup 
17987: Vac 1469, Keep/VTL 0/0, UnUsed 9120184, MinLen 92, MaxLen 468; 
Re-using: Free/Avail. Space 1504083956/1504083872; EndEmpty/Avail. Pages 
0/188901.
   CPU 6.23s/1.07u sec elapsed 55.02 sec.
INFO:  Index __idx: Pages 29296; Tuples 17987: Deleted 1469.
   CPU 1.08s/0.20u sec elapsed 61.68 sec.
INFO:  Index __idx: Pages 18412; Tuples 17987: Deleted 1469.
   CPU 0.67s/0.05u sec elapsed 17.90 sec.
INFO:  Rel : Pages: 188903 -- 393; Tuple(s) moved: 17985.
   CPU 15.97s/19.11u sec elapsed 384.49 sec.
INFO:  Index __idx: Pages 29326; Tuples 17987: Deleted 17985.
   CPU 1.14s/0.65u sec elapsed 32.34 sec.
INFO:  Index __idx: Pages 18412; Tuples 17987: Deleted 17985.
   CPU 0.43s/0.32u sec elapsed 13.37 sec.
VACUUM
Time: 566313.54 ms
= select count(*) from ;
count
---
17987
(1 row)

Time: 22.82 ms
Is there any way to avoid doing a periodic VACUUM FULL on this table, 
given the fairly radical usage pattern? Or is the (ugly) answer to 
redesign our application to avoid this usage pattern?

Also, how do I read the output of VACUUM FULL? 
http://www.postgresql.org/docs/7.3/interactive/sql-vacuum.html does not 
explain how to interpret the output, nor has google helped. I have a 
feeling that the full vacuum is compressing hundreds of thousands of 
pages of sparse data into tens of thousands of pages of dense data, thus 
reducing the number of block reads by an order of magnitude, but I'm not 
quite sure how to read the output.

FWIW, this is last night's relevant output from the scheduled VACUUM 
ANALYZE. 24 days have passed since the VACUUM FULL above:

INFO:  --Relation public.xxx--
INFO:  Index xxx_yyy_idx: Pages 30427; Tuples 34545: Deleted 77066.
   CPU 1.88s/0.51u sec elapsed 95.39 sec.
INFO:  Index xxx_zzz_idx: Pages 19049; Tuples 34571: Deleted 77066.
   CPU 0.83s/0.40u sec elapsed 27.92 sec.
INFO:  Removed 77066 tuples in 3474 pages.
   CPU 0.38s/0.32u sec elapsed 1.33 sec.
INFO:  Pages 13295: Changed 276, Empty 0; Tup 34540: Vac 77066, Keep 0, 
UnUsed 474020.
   Total CPU 3.34s/1.29u sec elapsed 125.00 sec.
INFO:  Analyzing public.xxx

Best Regards,
Bill Montgomery
---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [PERFORM] PostgreSQL caching

2004-05-21 Thread Neil Conway
Rosser Schwarz wrote:
PostgreSQL uses the operating system's disk cache.
... in addition to its own buffer cache, which is stored in shared 
memory. You're correct though, in that the best practice is to keep the 
PostgreSQL cache small and give more memory to the operating system's 
disk cache.

Pgsql uses the OS's disk cache instead of its own cache management
because the former is more likely to persist.  If the postmaster
managed the cache, as soon as the last connection died, the memory
allocated for caching would be released, and all the cached data
would be lost.
No; the cache is stored in shared memory. It wouldn't persist over 
postmaster restarts (without some scheme of saving and restoring it), 
but that has nothing to do with why the OS disk cache is usually kept 
larger than the PG shared buffer cache.

-Neil
---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
 subscribe-nomail command to [EMAIL PROTECTED] so that your
 message can get through to the mailing list cleanly


Re: [PERFORM] Avoiding vacuum full on an UPDATE-heavy table

2004-05-21 Thread Matthew T. O'Connor
 Is there any way to avoid doing a periodic VACUUM FULL on this table,
 given the fairly radical usage pattern? Or is the (ugly) answer to
 redesign our application to avoid this usage pattern?

Yes, you should be able to doing avoid periodic VACUUM FULL.  The problem
is that your table needs to be vacuumed MUCH more often.  What should
happen is that assuming you have enough FSM space allocated and assuming
you vacuum the right amount, your table will reach a steady state size. 
As you could see your from you vacumm verbose output your table was almost
entriely dead space.

pg_autovacuum would probably help as it monitors activity and vacuumus
tables accordingly.  It is not included with 7.3.x but if you download it
and compile yourself it will work against a 7.3.x server.

Good luck,

Matthew



---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] PostgreSQL caching

2004-05-21 Thread Marty Scholes
Not knowing a whole lot about the internals of Pg, one thing jumped out 
at me, that each trip to get data from bv_books took 2.137 ms, which 
came to over 4.2 seconds right there.

The problem seems to be the 1993 times that the nested loop spins, as 
almost all of the time is spent there.

Personally, I am amazed that it takes 3.585 seconds to index scan 
i_bookgenres_genre_id.  Is that a composite index?  Analyzing the 
taables may help, as the optimizer appears to mispredict the number of 
rows returned.

I would be curious to see how it performs with an IN clause, which I 
would suspect would go quite a bit fasrer.  Try the following:

SELECT bv_books. * ,
   vote_avg,
   vote_count
FROM   bv_bookgenres,
   bv_books
WHERE  bv_books.book_id IN (
  SELECT book_id
  FROM bv_genres
  WHERE bv_bookgenres.genre_id = 5830
  )
AND bv_bookgenres.genre_id = 5830
ORDER BY   vote_avg DESC LIMIT 10 OFFSET 0;
In this query, all of the book_id values are pulled at once.
Who knows?
If you get statisctics on this, please post.
Marty
---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] Avoiding vacuum full on an UPDATE-heavy table

2004-05-21 Thread Vivek Khera
 BM == Bill Montgomery [EMAIL PROTECTED] writes:

BM Is there any way to avoid doing a periodic VACUUM FULL on this table,
BM given the fairly radical usage pattern? Or is the (ugly) answer to
BM redesign our application to avoid this usage pattern?

I'll bet upgrading to 7.4.2 clears up your problems.  I'm not sure if
it was in 7.3 or 7.4 where the index bloat problem was solved.  Try to
see if just reindexing will help your performance.  Also, run a plain
vacuum at least nightly so that your table size stays reasonable.  It
won't take much time on a table with only 50k rows in it.


-- 
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Vivek Khera, Ph.D.Khera Communications, Inc.
Internet: [EMAIL PROTECTED]   Rockville, MD  +1-301-869-4449 x806
AIM: vivekkhera Y!: vivek_khera   http://www.khera.org/~vivek/

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] Avoiding vacuum full on an UPDATE-heavy table

2004-05-21 Thread Bill Montgomery
Matthew T. O'Connor wrote:
Is there any way to avoid doing a periodic VACUUM FULL on this table,
given the fairly radical usage pattern? Or is the (ugly) answer to
redesign our application to avoid this usage pattern?
   

pg_autovacuum would probably help as it monitors activity and vacuumus
tables accordingly.  It is not included with 7.3.x but if you download it
and compile yourself it will work against a 7.3.x server.
 

As a quick fix, since we're upgrading to 7.4.2 in a few weeks anyhow 
(which includes pg_autovacuum), I've simply set up an hourly vacuum on 
this table. It only takes ~4 seconds to execute when kept up on an 
hourly basis. Is there any penalty to vacuuming too frequently, other 
than the time wasted in an unnecessary vacuum operation?

My hourly VACUUM VERBOSE output now looks like this:
INFO:  --Relation public.--
INFO:  Index __idx: Pages 30452; Tuples 34990: Deleted 1226.
   CPU 0.67s/0.18u sec elapsed 0.87 sec.
INFO:  Index __idx: Pages 19054; Tuples 34991: Deleted 1226.
   CPU 0.51s/0.13u sec elapsed 1.35 sec.
INFO:  Removed 1226 tuples in 137 pages.
   CPU 0.01s/0.00u sec elapsed 1.30 sec.
INFO:  Pages 13709: Changed 31, Empty 0; Tup 34990: Vac 1226, Keep 0, 
UnUsed 567233.
   Total CPU 1.58s/0.31u sec elapsed 3.91 sec.
INFO:  Analyzing public.
VACUUM

With regards to Vivek's post about index bloat, I tried REINDEXing 
before I did a VACUUM FULL a month ago when performance had gotten 
dismal. It didn't help :-(

Best Regards,
Bill Montgomery
---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] Avoiding vacuum full on an UPDATE-heavy table

2004-05-21 Thread Tom Lane
Bill Montgomery [EMAIL PROTECTED] writes:
 I have a particularly troublesome table in my 7.3.4 database. It 
 typically has less than 50k rows, and a usage pattern of about 1k 
 INSERTs, 50-100k UPDATEs, and no DELETEs per day. It is vacuumed and 
 analyzed three times per week.

You probably want to vacuum (non-FULL) once a day, if not more often.
Also take a look at your FSM settings --- it seems like a good bet that
they're not large enough to remember all the free space in your
database.

With adequate FSM the table should stabilize at a physical size
corresponding to number-of-live-rows + number-of-updates-between-VACUUMs, 
which would be three times the minimum possible size if you vacuum once
a day (50K + 100K) or five times if you stick to every-other-day
(50K + 200K).  Your VACUUM FULL output shows that the table had bloated
to hundreds of times the minimum size:

 INFO:  Rel : Pages: 188903 -- 393; Tuple(s) moved: 17985.

and AFAIK the only way that will happen is if you fail to vacuum at all
or don't have enough FSM.

The indexes are looking darn large as well.  In 7.3 about the only thing
you can do about this is REINDEX the table every so often.  7.4 should
behave better though.

regards, tom lane

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


[PERFORM] tuning for AIX 5L with large memory

2004-05-21 Thread Dan Harris
I will soon have at my disposal a new IBM pSeries server.  The main 
mission for this box will be to serve several  pg databases.  I have 
ordered 8GB of RAM and want to learn the best way to tune pg and AIX for 
this configuration.  Specifically, I am curious about shared memory 
limitations.  I've had to tune the shmmax on linux machines before but 
I'm new to AIX and not sure if this is even required on that platform?  
Google has not been much help for specifics here. 

Hoping someone else here has a similar platform and can offer some advice..
Thanks!
-Dan Harris
---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
 subscribe-nomail command to [EMAIL PROTECTED] so that your
 message can get through to the mailing list cleanly


Re: [PERFORM] tuning for AIX 5L with large memory

2004-05-21 Thread Christopher Browne
Clinging to sanity, [EMAIL PROTECTED] (Dan Harris) mumbled into her beard:
 I will soon have at my disposal a new IBM pSeries server.  The main
 mission for this box will be to serve several pg databases.  I have
 ordered 8GB of RAM and want to learn the best way to tune pg and AIX
 for this configuration.  Specifically, I am curious about shared
 memory limitations.  I've had to tune the shmmax on linux machines
 before but I'm new to AIX and not sure if this is even required on
 that platform?  Google has not been much help for specifics here.

 Hoping someone else here has a similar platform and can offer some advice..

We have a couple of these at work; they're nice and fast, although the
process of compiling things, well, makes me feel a little unclean.

One of our sysadmins did all the configuring OS stuff part; I don't
recall offhand if there was a need to twiddle something in order to
get it to have great gobs of shared memory.

A quick Google on this gives me the impression that AIX supports, out
of the box, multiple GB of shared memory without special kernel
configuration.  A DB/2 configuration guide tells users of Solaris and
HP/UX that they need to set shmmax in sundry config files and reboot.
No such instruction for AIX.

If it needs configuring, it's probably somewhere in SMIT.  And you can
always try starting up an instance to see how big it'll let you make
shared memory.

The usual rule of thumb has been that having substantially more than
1 blocks worth of shared memory is unworthwhile.  I don't think
anyone has done a detailed study on AIX to see if bigger numbers play
well or not.  I would think that having more than about 1 to 1.5GB of
shared memory in use for buffer cache would start playing badly, but I
have no numbers.
-- 
select 'cbbrowne' || '@' || 'cbbrowne.com';
http://www3.sympatico.ca/cbbrowne/sap.html
Would-be National Mottos:
USA: We don't care where you come from.  We can't find our *own*
country on a map...

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


Re: [PERFORM] tuning for AIX 5L with large memory

2004-05-21 Thread Neil Conway
Christopher Browne wrote:
One of our sysadmins did all the configuring OS stuff part; I don't
recall offhand if there was a need to twiddle something in order to
get it to have great gobs of shared memory.
FWIW, the section on configuring kernel resources under various 
Unixen[1] doesn't have any documentation for AIX. If someone out there 
knows which knobs need to be tweaked, would they mind sending in a doc 
patch? (Or just specifying what needs to be done, and I'll add the SGML.)

-Neil
[1] 
http://developer.postgresql.org/docs/postgres/kernel-resources.html#SYSVIPC

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]